Thread: PRS-500 New tool: BBeB Binder
View Single Post
Old 01-04-2007, 12:29 AM   #29
airlik
Connoisseur
airlik began at the beginning.
 
Posts: 76
Karma: 15
Join Date: Oct 2006
Device: Sony Reader
Quote:
Originally Posted by cmumford
Yeah I was concerned about plays, and how to algorithmically differentiate them from text that should reflow. I looked at Dido Queene of Carthage and it has commas at the end of most of the lines. I'm sure that over the centuries there have been a ton of different styles used, and it's going to be quite a challenge to get this right.

I noticed that at the beginning of each paragraph there are names like _Cloan._ and _Iar._. Do you know what these mean?
Gutenberg is tough. I've done several macros to treat different kinds of text - that's what made me think of a list of checkboxes for "do this" and "do that" where you could hit "apply". One could be "turn single line feeds into two and leave two as two" kind of thing, that would fix most novels, but you could just uncheck it for plays.

MOST lines in plays look something like one of the following, which makes it hard (and hence a good toggle):

HAMLET.
Oh, what shall I do?

POLONIUS. Oh! I am slain.

Joe.Blow. Heya

So hard to recognize. There are also books that break, due to scanning badness, in mid-sentence. I detect those in my lame-o search-replace macros with lower-case letter followed by no punctuation mark, possibly a space, then a line break or two, followed by a lower case letter. So it picks up stuff like:

And then the man jumped

off the cliff.

Shakespeare and others often line break on purpose, but almost always start the next line with a cap. ex:
NORTHUMBERLAND.
What news, Lord Bardolph? every minute now
Should be the father of some stratagem:

QUEEN.
No, be assur'd you shall not find me, daughter,
After the slander of most stepmothers,
Evil-ey'd unto you. You're my prisoner, but
Your gaoler shall deliver you the keys
That lock up your restraint. For you, Posthumus,

etc etc.

Last edited by airlik; 01-04-2007 at 01:22 AM.
airlik is offline   Reply With Quote