View Single Post
Old 03-15-2012, 10:51 AM   #14
Elfwreck
Grand Sorcerer
Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.
 
Elfwreck's Avatar
 
Posts: 5,187
Karma: 25133758
Join Date: Nov 2008
Location: SF Bay Area, California, USA
Device: Pocketbook Touch HD3 (Past: Kobo Mini, PEZ, PRS-505, Clié)
FWIW, extracting text *mostly* works. I'd say 85% or more of text-based PDFs (not scans) convert fairly well to Word or HTML formats... and then need cleanup. Remove the headers & page #'s, which extract as just text. Get rid of the forced paragraph breaks at the ends of pages. Find the chapter headers and fix them. (They might be fine. They might be converted to plain text, depending on various font issues.) Look for sets of short lines of text--dialogue especially--that were all crammed into one paragraph.

The text itself tends to extract fine (if there weren't columns or magazine layouts to deal with), but the formatting needs a thorough touchup to be useful.
Elfwreck is offline   Reply With Quote