06-21-2007, 03:18 AM | #1 |
Junior Member
Posts: 9
Karma: 10
Join Date: Mar 2007
Device: Sony reader
|
PDF to text/html?
Now that we have so many great tools for converting text and/or html to lrf - what tools do you use to convert pdf to text/html? (in order to convert the result to lrf). I am looking for a tool which preserves the formatting (tables, italics...) and graphics of an pdf during the conversation process...
Uwe. |
06-21-2007, 04:07 AM | #2 |
Wizard
Posts: 3,442
Karma: 300001
Join Date: Sep 2006
Location: Belgium
Device: PRS-500/505/700, Kindle, Cybook Gen3, Words Gear
|
If you want to preserve everything, try ABBYY's FineReader or PDF Transformer. They're not cheap though.
|
06-21-2007, 05:44 AM | #3 |
Junior Member
Posts: 9
Karma: 10
Join Date: Mar 2007
Device: Sony reader
|
|
06-21-2007, 08:47 AM | #4 | |
Addict
Posts: 350
Karma: 705
Join Date: Dec 2006
Location: Mumbai, India
Device: Kindle 1/REB 1200
|
Quote:
Also, I've tried a lot of tools (both free/paid) and in the end the effort is just not worth it. Even taking a well formatted document, it takes at least an hour to get it in a readable form. A lot of the PDFs are not generated well, so sometimes the conversion is bizarre. The PDF => Image route is much simpler and requires no manual intervention (at the expense of taking more space and a much longer conversion time). |
|
06-21-2007, 09:09 AM | #5 |
Reader of the Reader
Posts: 103
Karma: 107
Join Date: Apr 2006
Device: Sony Reader PRS-500
|
pdftohtml
You want pdftohtml
Works great. Try the -c switch. Sam Krupa |
06-21-2007, 09:42 AM | #6 |
Resident Curmudgeon
Posts: 75,901
Karma: 134368292
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
What does the -c switch do?
|
06-21-2007, 09:57 AM | #7 |
Addict
Posts: 350
Karma: 705
Join Date: Dec 2006
Location: Mumbai, India
Device: Kindle 1/REB 1200
|
It converts the document in complex mode, ie. tries to remain faithful to PDF formatting as much as possible. I've found it not that useful, as it mostly then uses absolute positioning in HTML, which does not always convert well when seeing on an ebook.
|
06-21-2007, 03:09 PM | #8 |
Resident Curmudgeon
Posts: 75,901
Karma: 134368292
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
One important question... Does pdf2html keep the attibutes such as bold, italics, underline, etc?
Jon |
06-21-2007, 04:50 PM | #9 |
fruminous edugeek
Posts: 6,745
Karma: 551260
Join Date: Oct 2006
Location: Northeast US
Device: iPad, eBw 1150
|
I'm pretty sure pdf2html does keep italics, etc. Also worth looking at (if you're using a PC or running wine on Linux) is the processtext.com PDF converter, which is cheap, easy to use, and does a good job on nearly every file I've thrown at it (the exception being one that had very weird custom encoding). It also has a "complex" mode, similar to pdf2html, that helps greatly in eliminating those annoying fixed line lengths.
|
06-22-2007, 03:19 PM | #10 |
Enthusiast
Posts: 29
Karma: 11
Join Date: Jun 2007
Device: prs505
|
hi, i use finereader (ocr) in vmware to convert pdf, it's very good
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
PDF to HTML jumbled text | jeero | 2 | 09-03-2010 04:12 AM | |
HTML to MOBI text format is off when I get it on Kindle | cloudyvisions | Calibre | 5 | 07-14-2010 12:42 AM |
are there any good text to basic html programs? | grechzoo | General Discussions | 14 | 06-06-2010 01:05 PM |
Sigil loses all text after an html error | grumbles | Sigil | 3 | 05-13-2010 10:28 AM |