View Single Post
Old 12-01-2012, 03:15 PM   #1
DSpider
Evangelist
DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.
 
DSpider's Avatar
 
Posts: 450
Karma: 343115
Join Date: Nov 2009
Location: Romania
Device: PW2 2014
LayoutPrep – a custom Word macro that preps your OCR content for styles

I wrote it for myself a few months ago, and I think it works very well with ABBYY FineReader OCR'ed content, because FineReader adds all sorts of styles to it (even styles for bolds and italics).

This macro basically breaks down the content, enclosing each character (characters that are either bold, italic, subscript, superscript, etc) with special characters that act like tags. Then it saves it as plain text (yes, that's right), and restores the formatting, based on those tags. This means that the text will be back the way it was, except it will be squeaky clean, without styles; the styles turn into direct formatting, ready (or "prepped", get it?) for you to scroll through the entire thing and apply Quick Styles in Word or InDesign, which will result in a much cleaner project.


It's very well commented. Take a look and see for yourself.

Link: http://pastebin.com/9TLg4UWG


Notes:
  • Footnotes are turned into in-line text.
  • If FineReader detects some of the text as either headers or footers, that text will not make it into the final document. So be careful when proofreading the text in FineReader (which you should anyway, because automatic processing is not quite there yet). I usually just Cut-Paste the ogrish green header into the body text if it was detected as a header.
  • This macro may or may not work on older versions of Microsoft Word. I have only tested it in Word 2010.

Enjoy! Feedback and suggestions are welcome.
DSpider is offline   Reply With Quote