MobileRead Forums - View Single Post - epub - force a 2nd pass to improve structure detection ?

ldolse · 10-07-2010, 12:12 PM

Preprocess won't work for epub, but if you rename the epub from .epub to .zip and add the zip version back to the book record Calibre treats it identically to compressed html, which means preprocessing will work. You shouldn't have to go from epub to rtf and back.

Aside from looking for common chapters headings preprocessing does try to remove hard line breaks that are in the document. The default settings will only fix hard line breaks if the entire doc consists of hard line breaks. That's partially because of the line-unwrap factor - with only some broken lines the average/median line length is much larger than the actual break point where hard line breaks exist. If you have doc which has only some hard line breaks you need to set the unwrap factor much lower, possibly down to 0.2 or less.

All that is dependent on the actual book formatting though, preprocessing covers the most typical cases in Lit/html files, but if you're trying to convert something that went through some weird conversions it may not match the doc format.

10-07-2010, 12:12 PM	#7
ldolse Wizard Posts: 1,337 Karma: 123455 Join Date: Apr 2009 Location: Malaysia Device: PRS-650, iPhone	Preprocess won't work for epub, but if you rename the epub from .epub to .zip and add the zip version back to the book record Calibre treats it identically to compressed html, which means preprocessing will work. You shouldn't have to go from epub to rtf and back. Aside from looking for common chapters headings preprocessing does try to remove hard line breaks that are in the document. The default settings will only fix hard line breaks if the entire doc consists of hard line breaks. That's partially because of the line-unwrap factor - with only some broken lines the average/median line length is much larger than the actual break point where hard line breaks exist. If you have doc which has only some hard line breaks you need to set the unwrap factor much lower, possibly down to 0.2 or less. All that is dependent on the actual book formatting though, preprocessing covers the most typical cases in Lit/html files, but if you're trying to convert something that went through some weird conversions it may not match the doc format. Last edited by ldolse; 10-07-2010 at 12:14 PM.