View Single Post
Old 11-12-2007, 10:27 PM   #3
bob_ninja will become famous soon enoughbob_ninja will become famous soon enoughbob_ninja will become famous soon enoughbob_ninja will become famous soon enoughbob_ninja will become famous soon enoughbob_ninja will become famous soon enough
Posts: 208
Karma: 582
Join Date: Aug 2006
Device: Zire71
Here is an example I believe you refer to:

As a matter of fact, Mr. Bright left roughly speaking about one-fifth of
the whole Diary still unprinted, although he transcribed the whole, and
bequeathed his transcript to Magdalene College.
Please see the "General" tab, "Minimum paragraph length" setting.
The default value is 300 characters, or using 80 character lines almost 4 lines. So it determined that the section above is not really a paragraph and didn't process it. Now at some point I could/should add a more sophisticated semantics analyzer that would be smarter in distinguishing paragraphs from other sections. For now this simple check will have to do. So try reducing the value to a lower number. Perhaps I should use a smaller default.

Here some examples of sections that are NOT a paragraph and should not be processed:

Release Date: November, 2004 [EBook #6933]
[Yes, we are more than one year ahead of schedule]
[This file was first posted on February 13, 2003]
These lines are separate, less than 300 characters, so remain unchanged.


1632, 1633.


Le Jeune's Voyage.--His First Pupils.--His Studies.--
His Indian Teacher.--Winter at the Mission-house.--
Le Jeune's School.--Reinforcements.
A chapter TOC, again not a paragraph although one could argue that it is a paragraph of sorts that should be processed.


1639, 1640.


A Change of Plan.--Sainte Marie.--Mission of the Tobacco Nation.--
Winter Journeying.--Reception of the Missionaries.--
Superstitious Terrors.--Peril of Garnier and Jogues.--
Mission of the Neutrals.--Huron Intrigues.--Miracles.--
Fury of the Indians.--Intervention of Saint Michael.--
Return to Sainte Marie.--Intrepidity of the Priests.--
Their Mental Exaltation.
Another TOC this time is processed because it is longer than 300 characters. So my simple rule is not very smart/effective. Like I said, it will do for now.

So in summary, reduce minimum paragraph length if you find some paragraphs are not processed but you want them to be processed.
bob_ninja is offline   Reply With Quote