View Single Post
Old 11-01-2011, 03:44 PM   #9
Ethelred?
Member
Ethelred? has learned how to buy an e-book online
 
Posts: 13
Karma: 78
Join Date: Jul 2011
Device: kindle 2
I recently had this problem! I sympathize wholeheartedly. For the document I was converting I decided that losing formatting was okay, so I converted to text; strangely enough, using pdftotext worked, though if I remember correctly other pdf conversion utils like pdftohtml did not. I didn't try them all, though.

I did briefly consider writing a script to do an (ll-lossy) html conversion and check it against the text conversion, but decided it was too much work. If you come across this a lot it is an option, though.

In the worst case, the other thing I considered was using a lot of regular expressions to fix an ll-lossy file. If you like to use Word (with an RTF), it actually has halfway decent regex find/replace; otherwise I would do it against an html file with a good text editor. (LibreOffice has regex f/r too, but I've had some problems with it.) Automatically replacing things like an l followed by a space followed by some punctuation, or by an "ing" or "ed" or "er", etc., can save some time and frustration.

Last edited by Ethelred?; 11-01-2011 at 03:50 PM.
Ethelred? is offline   Reply With Quote