MobileRead Forums - View Single Post - Desperately seeking.... advice on epub conversion?

WillAdams · 10-29-2009, 07:11 AM

Don't start from the .pdfs --- instead use the Quark source.

Dump to XPress Tags or .html or some other sort of tagged format, then massage that, adding back in anything which wasn't in the main text flow (or get a specialized XTension/utility such as textractor).

PDFs convert the formatting into localized text changes and positional information which is difficult to extract. If you must use a .pdf as a source, use a utility such as Marcel Weiher's TextLightning.app which will analyze that positional information and then allow you to use global search-replace techniques to convert the local-formatting into proper styles.

William

10-29-2009, 07:11 AM	#6
WillAdams Wizard Posts: 1,234 Karma: 3350652 Join Date: Feb 2008 Device: Amazon Kindle Paperwhite (300ppi), Samsung Galaxy Book 12	Don't start from the .pdfs --- instead use the Quark source. Dump to XPress Tags or .html or some other sort of tagged format, then massage that, adding back in anything which wasn't in the main text flow (or get a specialized XTension/utility such as textractor). PDFs convert the formatting into localized text changes and positional information which is difficult to extract. If you must use a .pdf as a source, use a utility such as Marcel Weiher's TextLightning.app which will analyze that positional information and then allow you to use global search-replace techniques to convert the local-formatting into proper styles. William