View Single Post
Old 05-21-2012, 03:21 AM   #14
Dopedangel
Wizard
Dopedangel ought to be getting tired of karma fortunes by now.Dopedangel ought to be getting tired of karma fortunes by now.Dopedangel ought to be getting tired of karma fortunes by now.Dopedangel ought to be getting tired of karma fortunes by now.Dopedangel ought to be getting tired of karma fortunes by now.Dopedangel ought to be getting tired of karma fortunes by now.Dopedangel ought to be getting tired of karma fortunes by now.Dopedangel ought to be getting tired of karma fortunes by now.Dopedangel ought to be getting tired of karma fortunes by now.Dopedangel ought to be getting tired of karma fortunes by now.Dopedangel ought to be getting tired of karma fortunes by now.
 
Dopedangel's Avatar
 
Posts: 1,759
Karma: 30063305
Join Date: Dec 2006
Location: Singapore
Device: Boyue
Quote:
Originally Posted by joesh View Post
ldolse - thanks for the considered response and education on how Calibre removes in preprocessing much of the formatting I was hoping to use.

As far as blank lines are concerned, certainly PDF doesn't have them but translators like pdftotext do create them in the text output - as does pdftohtml I believe.

kiwidude - I really do understand that PDF is, in general, a programming language and a PostScript interpreter is a fairly large beast. That said, most screenplay PDFs are created by a small handful of programs and generally create PDFs that are easy enough for tools like pdftotext to render with pretty high fidelity.

[edit: I stand corrected - I've just found a script output from one of the big screenwriting programs that's not well rendered by pdftotext et al]

I'm sure not looking for perfection here. What pdftotext generates is very satisfactory. Which brings me to a different thought - most eReaders understand straight text, right? Perhaps an easier way to go would be to make a separate tool that'd rewrap paragraphs to a width appropriate for a given reader and then just send the resulting text file to the eReader. Comments?
Have you tried using an ocr software like ABBYY Finereader I think it would preserve the formating when it is used to convert or http://pdftransformer.abbyy.com/

Last edited by Dopedangel; 05-21-2012 at 03:23 AM.
Dopedangel is offline   Reply With Quote