MobileRead Forums - View Single Post - Want to add Screenplay format knowledge to Calibre

joesh · 05-11-2012, 07:02 AM

Seems to me there are two distinct features being requested here.

1. Can you make Calibre's PDF translation better?
2. Assuming an "acceptably-translated" PDF, can you add a "screenplay" heuristic set that'll be savvy about screenplay format?

I see from responses above and throughout the forums that (1) is a sore subject around here. No problem. PDF is fine input for minds but poor for computers. So lets go to (2).

I've played with feeding the current PDF parser a bunch of screenplays and I think that what it generates fits my criteria of an "acceptably-translated" PDF for the heuristics I have in mind.

These heuristics would mainly use indentation to detect structure. A block of text at a given level of indentation would be the unit of reflow. Blank lines would also delimit a block - as well as passing through unaltered.

That's most of it right there. I suspect there would be a few tweaks to this - like parentheticals allowing either same-level or +1 indentation to match - so that

Code:

    (this would
     be one block)

but I think this would do a pretty nice job.

Am I missing something really big?

05-11-2012, 07:02 AM	#10
joesh Junior Member Posts: 5 Karma: 10 Join Date: Oct 2011 Location: Seattle WA Device: Nook	Seems to me there are two distinct features being requested here. 1. Can you make Calibre's PDF translation better? 2. Assuming an "acceptably-translated" PDF, can you add a "screenplay" heuristic set that'll be savvy about screenplay format? I see from responses above and throughout the forums that (1) is a sore subject around here. No problem. PDF is fine input for minds but poor for computers. So lets go to (2). I've played with feeding the current PDF parser a bunch of screenplays and I think that what it generates fits my criteria of an "acceptably-translated" PDF for the heuristics I have in mind. These heuristics would mainly use indentation to detect structure. A block of text at a given level of indentation would be the unit of reflow. Blank lines would also delimit a block - as well as passing through unaltered. That's most of it right there. I suspect there would be a few tweaks to this - like parentheticals allowing either same-level or +1 indentation to match - so that Code: (this would be one block) but I think this would do a pretty nice job. Am I missing something really big? Last edited by joesh; 05-11-2012 at 07:03 AM. Reason: fix the blockquote