View Single Post
Old 05-11-2012, 07:02 AM   #10
Junior Member
joesh began at the beginning.
Posts: 5
Karma: 10
Join Date: Oct 2011
Location: Seattle WA
Device: Nook
Seems to me there are two distinct features being requested here.

1. Can you make Calibre's PDF translation better?
2. Assuming an "acceptably-translated" PDF, can you add a "screenplay" heuristic set that'll be savvy about screenplay format?

I see from responses above and throughout the forums that (1) is a sore subject around here. No problem. PDF is fine input for minds but poor for computers. So lets go to (2).

I've played with feeding the current PDF parser a bunch of screenplays and I think that what it generates fits my criteria of an "acceptably-translated" PDF for the heuristics I have in mind.

These heuristics would mainly use indentation to detect structure. A block of text at a given level of indentation would be the unit of reflow. Blank lines would also delimit a block - as well as passing through unaltered.

That's most of it right there. I suspect there would be a few tweaks to this - like parentheticals allowing either same-level or +1 indentation to match - so that
    (this would
     be one block)
but I think this would do a pretty nice job.

Am I missing something really big?

Last edited by joesh; 05-11-2012 at 07:03 AM. Reason: fix the blockquote
joesh is offline   Reply With Quote