Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book General > General Discussions

Notices

Reply
 
Thread Tools Search this Thread
Old 03-15-2012, 12:54 PM   #16
rkomar
Wizard
rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.
 
Posts: 3,042
Karma: 18821071
Join Date: Oct 2010
Location: Sudbury, ON, Canada
Device: PRS-505, PB 902, PRS-T1, PB 623, PB 840, PB 633
Quote:
Originally Posted by Elfwreck View Post
FWIW, extracting text *mostly* works. I'd say 85% or more of text-based PDFs (not scans) convert fairly well to Word or HTML formats... and then need cleanup. Remove the headers & page #'s, which extract as just text. Get rid of the forced paragraph breaks at the ends of pages. Find the chapter headers and fix them. (They might be fine. They might be converted to plain text, depending on various font issues.) Look for sets of short lines of text--dialogue especially--that were all crammed into one paragraph.

The text itself tends to extract fine (if there weren't columns or magazine layouts to deal with), but the formatting needs a thorough touchup to be useful.
Right. And all this complexity is only for relatively simple text. Throw in tables, equations,... and the complexity goes through the roof. I remember reading presentations years ago by LaTeX developers who were studying how to apply something like reflow tags to the entire set of elements in a document. None could figure out how to do it reliably, and so it still isn't possible to create reflowable PDF files directly from LaTeX sources. If they can't do it properly from the source files, then it's all but impossible to do it well from the resulting PDF.
rkomar is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
eBook PDF - free tool for creating PDF eBooks from text files KACartlidge PDF 6 01-04-2012 09:41 AM
PDF Reader Review and Guide: View, Optimize and Create PDF files UpSpin Sony Reader 15 11-26-2011 10:11 AM
【Best PDF Size】I find The reason of slowing When Read PDF file linlance Sony Reader 0 03-11-2010 08:13 AM
Flowable Text PDF app Gideon Apple Devices 2 11-19-2009 04:46 PM


All times are GMT -4. The time now is 07:06 PM.


MobileRead.com is a privately owned, operated and funded community.