Quote:
Originally Posted by Karl Murks
FineReader's page formatting is pretty shitty for the most part. I've yet to see a single document I scanned where it produced good results. I suffered through too much trouble before finding out that saving to a file format that doesn't have these obnoxious textboxes is really the only way to get a decent document.
What I do most of the time is to save as HTML, then load the saved document in a text editor (Notepad++) and remove the stylesheet references by looking for all '<span class=' occurences and replacing them until none are left. I trust this method much more than relying on word processing software to keep a clean file and can be certain to be rid of all font changes FineReader inserts. After this process the only thing left will be italic, bold and similar effects.
|
I've been using an older copy of Omnipage 16, with my source a digital camera taking pictures of the pages. Omnipage has an export format called Wordpad(RTF). From experience, I know that Wordpad (an old application found in older versions of windows, and probably still buried in the newer ones) uses a "limited" version of RTF. The text comes out much cleaner, and standard procedures of selecting all text and making it the same font, etc., will tend to clean up the rest. Then, the rtf doc can be opened bu Word, for further text manipulation.
When I worked at MS as a test contractor, we called the Wordpad version of RTF the "true" RTF, and the Word version "Woozle" (dunno why, that's just what everyone else called it.) It had extended features of RTF, that was the MS way of trying to make it sorta proprietary and not truly compatible with other applications' implementation.
Even now, if you can find a version of Wordpad on your computer (look in the Accessories start menu), you should be able have Word save as an RTF, open that RTF with Wordpad (which will ignore the features it doesn't understand), then save as again from Wordpad to a different RTF filename. Voila, extra strange formatting gone, since Wordpad doesn't do styles.