MobileRead Forums - View Single Post

ldolse · 10-02-2010, 10:31 AM

Jackie's suggestion is definitely the best way to go about this, especially if you're the author/doc creator.

The point with preprocessing is there are a lot of documents where the publisher of the doc didn't do this, and you're stuck with a doc where chapter detection doesn't work. Preprocessing looks for different patterns in the doc. It starts with lines that say 'Chapter xxx', or similar common headings. If it doesn't detect any chapters it tries numeric headings, and lastly it looks for lines which are composed entirely of uppercase words. This approach can lead to false positives, which is why the help indicates that it could potentially screw up your conversion. In practice false positives are pretty low, and if it happens it's generally easier to fix one or two false positives than it is to manually mark up the whole book beforehand.

Did you actually try enabling preprocessing yet?

10-02-2010, 10:31 AM	#5
ldolse Wizard Posts: 1,337 Karma: 123455 Join Date: Apr 2009 Location: Malaysia Device: PRS-650, iPhone	Jackie's suggestion is definitely the best way to go about this, especially if you're the author/doc creator. The point with preprocessing is there are a lot of documents where the publisher of the doc didn't do this, and you're stuck with a doc where chapter detection doesn't work. Preprocessing looks for different patterns in the doc. It starts with lines that say 'Chapter xxx', or similar common headings. If it doesn't detect any chapters it tries numeric headings, and lastly it looks for lines which are composed entirely of uppercase words. This approach can lead to false positives, which is why the help indicates that it could potentially screw up your conversion. In practice false positives are pretty low, and if it happens it's generally easier to fix one or two false positives than it is to manually mark up the whole book beforehand. Did you actually try enabling preprocessing yet? Last edited by ldolse; 10-02-2010 at 10:41 AM. Reason: Confused different threads