View Single Post
Old 10-29-2009, 07:11 AM   #6
WillAdams
Guru
WillAdams ought to be getting tired of karma fortunes by now.WillAdams ought to be getting tired of karma fortunes by now.WillAdams ought to be getting tired of karma fortunes by now.WillAdams ought to be getting tired of karma fortunes by now.WillAdams ought to be getting tired of karma fortunes by now.WillAdams ought to be getting tired of karma fortunes by now.WillAdams ought to be getting tired of karma fortunes by now.WillAdams ought to be getting tired of karma fortunes by now.WillAdams ought to be getting tired of karma fortunes by now.WillAdams ought to be getting tired of karma fortunes by now.WillAdams ought to be getting tired of karma fortunes by now.
 
WillAdams's Avatar
 
Posts: 938
Karma: 1760710
Join Date: Feb 2008
Device: Sony PRS-600, Fujitsu Stylistic ST-4121
Don't start from the .pdfs --- instead use the Quark source.

Dump to XPress Tags or .html or some other sort of tagged format, then massage that, adding back in anything which wasn't in the main text flow (or get a specialized XTension/utility such as textractor).

PDFs convert the formatting into localized text changes and positional information which is difficult to extract. If you must use a .pdf as a source, use a utility such as Marcel Weiher's TextLightning.app which will analyze that positional information and then allow you to use global search-replace techniques to convert the local-formatting into proper styles.

William
WillAdams is offline   Reply With Quote