ldolse - thanks for the considered response and education on how Calibre removes in preprocessing much of the formatting I was hoping to use.
As far as blank lines are concerned, certainly PDF doesn't have them but translators like pdftotext do create them in the text output - as does pdftohtml I believe.
kiwidude - I really do understand that PDF is, in general, a programming language and a PostScript interpreter is a fairly large beast. That said, most screenplay PDFs are created by a small handful of programs and generally create PDFs that are easy enough for tools like pdftotext to render with pretty high fidelity.
[edit: I stand corrected - I've just found a script output from one of the big screenwriting programs that's not well rendered by pdftotext et al]
I'm sure not looking for perfection here. What pdftotext generates is very satisfactory. Which brings me to a different thought - most eReaders understand straight text, right? Perhaps an easier way to go would be to make a separate tool that'd rewrap paragraphs to a width appropriate for a given reader and then just send the resulting text file to the eReader. Comments?
Last edited by joesh; 05-17-2012 at 06:52 AM.
Reason: found that - as kiwidude said - even pdf for screenplays can be knotty to render
|