Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Formats > Workshop

Notices

Reply
 
Thread Tools Search this Thread
Old 11-18-2007, 09:59 PM   #1
profnachos
Connoisseur
profnachos began at the beginning.
 
Posts: 52
Karma: 43
Join Date: Nov 2007
Device: Palm Treo
Question Reading PDF files

I am sure this has been asked a million times, so please bear with me.

What viable and inexpensive way is there to convert PDF to either HTML or DOC?

I have an ebookwise reader, and tried

- PDFReader: This converts to image files, which does not work for me. The resulting file is large and unreadable, not to mention that text related features do not work on images.

- pdftohtml: This converts to HTML, but it does not differentiate between paragraphs and line breaks, and the resulting text in the HTML is all jumbled up. Other conversion tools I have tried do the same.
profnachos is offline   Reply With Quote
Old 11-18-2007, 11:37 PM   #2
Vesper
Addict
Vesper can extract oil from cheeseVesper can extract oil from cheeseVesper can extract oil from cheeseVesper can extract oil from cheeseVesper can extract oil from cheeseVesper can extract oil from cheeseVesper can extract oil from cheeseVesper can extract oil from cheeseVesper can extract oil from cheese
 
Vesper's Avatar
 
Posts: 205
Karma: 1133
Join Date: Nov 2007
Location: Serbia
Device: Sony PRS-350, Cybook Gen3, Palm T|X
I'm not sure what inexpensive means to you. The best so far - but not very cheap - is abbyy's pdf transformer. If you are not scared by 89 euro tag you can take a look...

DrS
Vesper is offline   Reply With Quote
 
Enthusiast
Old 11-19-2007, 12:14 AM   #3
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 24,813
Karma: 4369673
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
The output of pdftohtml can be easily further machine processed see for example pdf2lrf
kovidgoyal is offline   Reply With Quote
Old 11-19-2007, 04:07 AM   #4
rlauzon
Wizard
rlauzon put the bomp in the bomp-a-bomp-a-bomp.rlauzon put the bomp in the bomp-a-bomp-a-bomp.rlauzon put the bomp in the bomp-a-bomp-a-bomp.rlauzon put the bomp in the bomp-a-bomp-a-bomp.rlauzon put the bomp in the bomp-a-bomp-a-bomp.rlauzon put the bomp in the bomp-a-bomp-a-bomp.rlauzon put the bomp in the bomp-a-bomp-a-bomp.rlauzon put the bomp in the bomp-a-bomp-a-bomp.rlauzon put the bomp in the bomp-a-bomp-a-bomp.rlauzon put the bomp in the bomp-a-bomp-a-bomp.rlauzon put the bomp in the bomp-a-bomp-a-bomp.
 
rlauzon's Avatar
 
Posts: 1,017
Karma: 67827
Join Date: Jan 2005
Device: Opus/System76 Starling
Quote:
Originally Posted by profnachos View Post
What viable and inexpensive way is there to convert PDF to either HTML or DOC?
Short answer: No.

Longer answer: All PDF to anything tools will require a great deal of manual editing on your part. This is because PDF simply doesn't store the information for certain things - paragraph breaks, for example.
rlauzon is offline   Reply With Quote
Old 11-19-2007, 06:22 AM   #5
tompe
Grand Sorcerer
tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.
 
Posts: 6,886
Karma: 2753841
Join Date: Oct 2007
Location: Linköpng, Sweden
Device: Nexus 7, Nexus 4, iPad 2, Notion Ink Adam Qi, Kindle WiFi, Kindle PW
Put your pdf document on the net. Wait for some time. Search for your document in google and use "view as html".
tompe is online now   Reply With Quote
Old 11-20-2007, 12:13 AM   #6
profnachos
Connoisseur
profnachos began at the beginning.
 
Posts: 52
Karma: 43
Join Date: Nov 2007
Device: Palm Treo
Quote:
Originally Posted by kovidgoyal View Post
The output of pdftohtml can be easily further machine processed see for example pdf2lrf
The problem is, I have ebookwise, not Sony.
profnachos is offline   Reply With Quote
Old 11-20-2007, 12:40 AM   #7
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 24,813
Karma: 4369673
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
I just meant to give you an idea how to do it. Basically pdftohtml preserves line breaks using <br> elements. These need to be removed intelligently (based on line length) and two consecutive <br> elements become a new paragraph.
kovidgoyal is offline   Reply With Quote
Old 11-20-2007, 01:11 AM   #8
DaleDe
Grand Sorcerer
DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.
 
DaleDe's Avatar
 
Posts: 9,395
Karma: 4531756
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2
Quote:
Originally Posted by kovidgoyal View Post
I just meant to give you an idea how to do it. Basically pdftohtml preserves line breaks using <br> elements. These need to be removed intelligently (based on line length) and two consecutive <br> elements become a new paragraph.
Actually the double br should be replaced with \p p and the rest removed for the ebookwise. then maybe clean up the first and last paragraph manually.

Dale
DaleDe is online now   Reply With Quote
Old 11-24-2007, 11:52 AM   #9
jpathomas
Junior Member
jpathomas began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Dec 2006
Device: Sony Reader PRS-500
Has anyone tried reformatting PDF files to fit the Sony Reader using Acrobat? I have a number of technical documents (mostly certification materials) that I would like to convert for my own use on the Sony reader, but everything I've tried short of Acrobat produces unacceptable content. I've been considering buying a full version of Acrobat, but it's not inexpensive.

What I want to do is recreate the PDF documents I have so that the images are displayed in their correct position with regard to the text. This will require resizing the pages to fit the Sony's screen size, and possibly resizing the images also. At this point I don't know that this is possible.
jpathomas is offline   Reply With Quote
Old 11-24-2007, 12:04 PM   #10
DaleDe
Grand Sorcerer
DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.
 
DaleDe's Avatar
 
Posts: 9,395
Karma: 4531756
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2
Quote:
Originally Posted by jpathomas View Post
Has anyone tried reformatting PDF files to fit the Sony Reader using Acrobat? I have a number of technical documents (mostly certification materials) that I would like to convert for my own use on the Sony reader, but everything I've tried short of Acrobat produces unacceptable content. I've been considering buying a full version of Acrobat, but it's not inexpensive.

What I want to do is recreate the PDF documents I have so that the images are displayed in their correct position with regard to the text. This will require resizing the pages to fit the Sony's screen size, and possibly resizing the images also. At this point I don't know that this is possible.
The Acrobat Reader cannot reformat text. Generally it is already formated for the paper.

The approach for PDF's is to print the document to a particular pre-sized paper that is the size needed for Sony use. You use a PDF creation program as the printer device and set it up to use the paper size you want. There are plenty of these kinds of PDF creation tools out there. Some are free. This works with documents like word files and text that is essentially reflowable and can be conformed to the chosen page size. It does not work for documents that are designed for fixed page sizes which is typically the case for PDF's you encounter at work. If you can get the source files then you can do what you want.

Dale
DaleDe is online now   Reply With Quote
Old 11-24-2007, 12:18 PM   #11
vivaldirules
When's Doughnut Day?
vivaldirules ought to be getting tired of karma fortunes by now.vivaldirules ought to be getting tired of karma fortunes by now.vivaldirules ought to be getting tired of karma fortunes by now.vivaldirules ought to be getting tired of karma fortunes by now.vivaldirules ought to be getting tired of karma fortunes by now.vivaldirules ought to be getting tired of karma fortunes by now.vivaldirules ought to be getting tired of karma fortunes by now.vivaldirules ought to be getting tired of karma fortunes by now.vivaldirules ought to be getting tired of karma fortunes by now.vivaldirules ought to be getting tired of karma fortunes by now.vivaldirules ought to be getting tired of karma fortunes by now.
 
vivaldirules's Avatar
 
Posts: 10,040
Karma: 13675425
Join Date: Jul 2007
Location: Houston, TX, US
Device: Sony PRS-505, iPad
What about using Acrobat (i.e., not Acrobat Reader)?
vivaldirules is offline   Reply With Quote
Old 11-24-2007, 12:58 PM   #12
DaleDe
Grand Sorcerer
DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.
 
DaleDe's Avatar
 
Posts: 9,395
Karma: 4531756
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2
Quote:
Originally Posted by vivaldirules View Post
What about using Acrobat (i.e., not Acrobat Reader)?
The full version of acrobat can do this and most anything else depending on the protection assigned to the files. It is a source editor.

Dale
DaleDe is online now   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Kindle2 error with reading pdf files. neadams99 Introduce Yourself 4 02-04-2010 02:28 AM
Best option for reading large pdf files? eanach1 PDF 6 01-12-2010 01:55 AM
Reading PDF files on Windows or Linux Bob Russell PDF 18 02-14-2009 01:21 PM
Reading large PDF files the_apotheosis iRex 6 12-06-2007 11:46 AM
Best device for reading PDF files? Barret Which one should I buy? 8 06-27-2006 04:21 AM


All times are GMT -4. The time now is 01:33 PM.


MobileRead.com is a privately owned, operated and funded community.