Quote:
Originally Posted by joesh
ldolse - thanks for the considered response and education on how Calibre removes in preprocessing much of the formatting I was hoping to use.
As far as blank lines are concerned, certainly PDF doesn't have them but translators like pdftotext do create them in the text output - as does pdftohtml I believe.
kiwidude - I really do understand that PDF is, in general, a programming language and a PostScript interpreter is a fairly large beast. That said, most screenplay PDFs are created by a small handful of programs and generally create PDFs that are easy enough for tools like pdftotext to render with pretty high fidelity.
[edit: I stand corrected - I've just found a script output from one of the big screenwriting programs that's not well rendered by pdftotext et al]
I'm sure not looking for perfection here. What pdftotext generates is very satisfactory. Which brings me to a different thought - most eReaders understand straight text, right? Perhaps an easier way to go would be to make a separate tool that'd rewrap paragraphs to a width appropriate for a given reader and then just send the resulting text file to the eReader. Comments?
|
Have you tried using an ocr software like ABBYY Finereader I think it would preserve the formating when it is used to convert or
http://pdftransformer.abbyy.com/