Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Readers > Sony Reader

Notices

Reply
 
Thread Tools Search this Thread
Old 06-21-2007, 03:18 AM   #1
utrost
Junior Member
utrost began at the beginning.
 
Posts: 9
Karma: 10
Join Date: Mar 2007
Device: Sony reader
Question PDF to text/html?

Now that we have so many great tools for converting text and/or html to lrf - what tools do you use to convert pdf to text/html? (in order to convert the result to lrf). I am looking for a tool which preserves the formatting (tables, italics...) and graphics of an pdf during the conversation process...


Uwe.
utrost is offline   Reply With Quote
Old 06-21-2007, 04:07 AM   #2
igorsk
Wizard
igorsk reads XML... blindfoldedigorsk reads XML... blindfoldedigorsk reads XML... blindfoldedigorsk reads XML... blindfoldedigorsk reads XML... blindfoldedigorsk reads XML... blindfoldedigorsk reads XML... blindfoldedigorsk reads XML... blindfoldedigorsk reads XML... blindfoldedigorsk reads XML... blindfoldedigorsk reads XML... blindfolded
 
Posts: 3,443
Karma: 52235
Join Date: Sep 2006
Location: Belgium
Device: PRS-500/505/700, Kindle, Cybook Gen3, Words Gear
If you want to preserve everything, try ABBYY's FineReader or PDF Transformer. They're not cheap though.
igorsk is offline   Reply With Quote
 
Enthusiast
Old 06-21-2007, 05:44 AM   #3
utrost
Junior Member
utrost began at the beginning.
 
Posts: 9
Karma: 10
Join Date: Mar 2007
Device: Sony reader
Hm, I was thinking more of an open source /freeware tool :-)
Any Ideas?

Quote:
Originally Posted by igorsk View Post
If you want to preserve everything, try ABBYY's FineReader or PDF Transformer. They're not cheap though.
utrost is offline   Reply With Quote
Old 06-21-2007, 08:47 AM   #4
ashkulz
Addict
ashkulz will become famous soon enoughashkulz will become famous soon enoughashkulz will become famous soon enoughashkulz will become famous soon enoughashkulz will become famous soon enoughashkulz will become famous soon enoughashkulz will become famous soon enough
 
ashkulz's Avatar
 
Posts: 350
Karma: 705
Join Date: Dec 2006
Location: Mumbai, India
Device: Kindle 1/REB 1200
Quote:
Originally Posted by utrost View Post
Hm, I was thinking more of an open source /freeware tool :-)
Any Ideas?
Well, you can try PDFRead (disclaimer: I wrote it) to convert PDFs -- it doesn't really convert it to text/html, it creates images out of each page so all formatting is preserved. There's another tool which does the same functionality called RasterFarian (search the forums and you'll find it).

Also, I've tried a lot of tools (both free/paid) and in the end the effort is just not worth it. Even taking a well formatted document, it takes at least an hour to get it in a readable form. A lot of the PDFs are not generated well, so sometimes the conversion is bizarre. The PDF => Image route is much simpler and requires no manual intervention (at the expense of taking more space and a much longer conversion time).
ashkulz is offline   Reply With Quote
Old 06-21-2007, 09:09 AM   #5
sammykrupa
Reader of the Reader
sammykrupa doesn't littersammykrupa doesn't litter
 
Posts: 103
Karma: 107
Join Date: Apr 2006
Device: Sony Reader PRS-500
pdftohtml

You want pdftohtml

Works great.

Try the -c switch.

Sam Krupa
sammykrupa is offline   Reply With Quote
Old 06-21-2007, 09:42 AM   #6
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 37,646
Karma: 18475502
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Sony Reader PRS-650, iPad, nook STR
What does the -c switch do?
JSWolf is offline   Reply With Quote
Old 06-21-2007, 09:57 AM   #7
ashkulz
Addict
ashkulz will become famous soon enoughashkulz will become famous soon enoughashkulz will become famous soon enoughashkulz will become famous soon enoughashkulz will become famous soon enoughashkulz will become famous soon enoughashkulz will become famous soon enough
 
ashkulz's Avatar
 
Posts: 350
Karma: 705
Join Date: Dec 2006
Location: Mumbai, India
Device: Kindle 1/REB 1200
Quote:
Originally Posted by JSWolf View Post
What does the -c switch do?
It converts the document in complex mode, ie. tries to remain faithful to PDF formatting as much as possible. I've found it not that useful, as it mostly then uses absolute positioning in HTML, which does not always convert well when seeing on an ebook.
ashkulz is offline   Reply With Quote
Old 06-21-2007, 03:09 PM   #8
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 37,646
Karma: 18475502
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Sony Reader PRS-650, iPad, nook STR
One important question... Does pdf2html keep the attibutes such as bold, italics, underline, etc?

Jon
JSWolf is offline   Reply With Quote
Old 06-21-2007, 04:50 PM   #9
nekokami
fruminous edugeek
nekokami ought to be getting tired of karma fortunes by now.nekokami ought to be getting tired of karma fortunes by now.nekokami ought to be getting tired of karma fortunes by now.nekokami ought to be getting tired of karma fortunes by now.nekokami ought to be getting tired of karma fortunes by now.nekokami ought to be getting tired of karma fortunes by now.nekokami ought to be getting tired of karma fortunes by now.nekokami ought to be getting tired of karma fortunes by now.nekokami ought to be getting tired of karma fortunes by now.nekokami ought to be getting tired of karma fortunes by now.nekokami ought to be getting tired of karma fortunes by now.
 
nekokami's Avatar
 
Posts: 6,745
Karma: 551260
Join Date: Oct 2006
Location: Northeast US
Device: iPad, eBw 1150
I'm pretty sure pdf2html does keep italics, etc. Also worth looking at (if you're using a PC or running wine on Linux) is the processtext.com PDF converter, which is cheap, easy to use, and does a good job on nearly every file I've thrown at it (the exception being one that had very weird custom encoding). It also has a "complex" mode, similar to pdf2html, that helps greatly in eliminating those annoying fixed line lengths.
nekokami is offline   Reply With Quote
Old 06-22-2007, 03:19 PM   #10
paspas
Member
paspas began at the beginning.
 
Posts: 24
Karma: 11
Join Date: Jun 2007
Device: prs505
hi, i use finereader (ocr) in vmware to convert pdf, it's very good
paspas is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
PDF to HTML jumbled text jeero PDF 2 09-03-2010 04:12 AM
HTML to MOBI text format is off when I get it on Kindle cloudyvisions Calibre 5 07-14-2010 12:42 AM
are there any good text to basic html programs? grechzoo General Discussions 14 06-06-2010 01:05 PM
Sigil loses all text after an html error grumbles Sigil 3 05-13-2010 10:28 AM


All times are GMT -4. The time now is 03:02 AM.


MobileRead.com is a privately owned, operated and funded community.