|
|
Thread Tools | Search this Thread |
12-01-2012, 03:15 PM | #1 |
Evangelist
Posts: 450
Karma: 343115
Join Date: Nov 2009
Location: Romania
Device: PW2 2014
|
LayoutPrep – a custom Word macro that preps your OCR content for styles
I wrote it for myself a few months ago, and I think it works very well with ABBYY FineReader OCR'ed content, because FineReader adds all sorts of styles to it (even styles for bolds and italics).
This macro basically breaks down the content, enclosing each character (characters that are either bold, italic, subscript, superscript, etc) with special characters that act like tags. Then it saves it as plain text (yes, that's right), and restores the formatting, based on those tags. This means that the text will be back the way it was, except it will be squeaky clean, without styles; the styles turn into direct formatting, ready (or "prepped", get it?) for you to scroll through the entire thing and apply Quick Styles in Word or InDesign, which will result in a much cleaner project. It's very well commented. Take a look and see for yourself. Link: http://pastebin.com/9TLg4UWG Notes:
Enjoy! Feedback and suggestions are welcome. |
12-01-2012, 04:09 PM | #2 |
Wizard
Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
|
If I can find it again, I will also post my version. That one does retain footnotes. I hope I stll have it. Computer crash...
** Found it, but it needs some tweaking. It is not quite as I want it. Last edited by Toxaris; 12-02-2012 at 09:13 AM. |
Advert | |
|
12-02-2012, 10:23 AM | #3 |
BioReader
Posts: 292
Karma: 42568
Join Date: Apr 2009
Location: Germany
Device: Various
|
DSpider - thank you for that fine piece of work. I am able now to get through my workflow much faster (eliminated):
- scan book into FineReader11 - unifying defined styles in FineReader with the style edit function - arranging some "false" headers as chapter headings - saving in to Word file (as editable copy) - cleaning styles in the file within word - Character editing with search & replace/regex - chaptering - saving to odt & reading into Jutoh - making the ebook with footnotes, graphics and all bells & whistles I applied the macro to several files which I scanned recently and it showed consistent results. Sometimes some "|" are left over but that can be detected quickly. I am currently scanning my SF paperback library into epub and your macro helps a lot. I will keep you informed in case I find some glitches Klaus |
12-02-2012, 01:53 PM | #4 |
Wizard
Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
|
Found it. This will retain footnotes.
Code:
Private Sub cleancopy() Dim Style As Style Dim i As Integer Selection.WholeStory Selection.Cut For Each Style In ActiveDocument.Styles Style.Locked = True 'alle stijlen blokkeren Next ActiveDocument.Styles(wdStyleNormal).Locked = False ActiveDocument.Protect Password:="password", NoReset:=False, Type:= _ wdNoProtection, UseIRM:=False, EnforceStyleLock:=True On Error Resume Next For i = ActiveDocument.Styles.count To 1 Step -1 If ActiveDocument.Styles(i).Locked = True Then ActiveDocument.Styles(i).Delete End If Next i Selection.PasteAndFormat (wdFormatOriginalFormatting) ActiveDocument.Unprotect Password:="password" End Sub |
12-09-2012, 04:34 PM | #5 |
Evangelist
Posts: 450
Karma: 343115
Join Date: Nov 2009
Location: Romania
Device: PW2 2014
|
Any idea how to split it into a bunch of 50 page documents? Because it's progressively slower after 200 pages or so, and processing a 400 page book takes a really long time. The task manager constantly shows that WINWORD.EXE is using 97-99% CPU. Imagine processing a 400 page paragraph... Because that's what it basically does - it breaks the entire document into a string of text, then reassembles it.
|
Advert | |
|
12-10-2012, 01:24 AM | #6 |
BioReader
Posts: 292
Karma: 42568
Join Date: Apr 2009
Location: Germany
Device: Various
|
Yes - I noticed some slowness. For 600 pages it takes 10 mins on my i7 PC. Compared to the saved >60 mins in other parts of my workflow it is still a bargain ;-).
Could you possibly build in a loop that counts pages inside Word? The macro would finish the first 100 pages then works on the next 100 and so on. |
03-28-2013, 05:52 AM | #7 |
Wizard
Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
|
A bit late, but you could try using ranges instead of normal searches. They are usually faster.
You might also skip a few steps, like replacing the line breaks with a code. You can extend the selection with the moveuntil combined with cset. In that case you can ensure you also take linebreaks and alike within your selection. |
Tags |
ebook tools |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Word macro for clean HTML code | Toxaris | ePub | 135 | 02-28-2015 02:21 AM |
Word Macro: Footnotes to inline text ? | Hadrien | Workshop | 17 | 10-08-2011 01:28 PM |
Word 2007 macro | scoplar | Workshop | 1 | 05-21-2011 05:00 AM |
Doc to REB Macro for Word | kbirdz | Fictionwise eBookwise | 0 | 10-27-2010 10:30 PM |
Word Formatting Macro (Stingo's Macro) | Stingo | Sony Reader | 75 | 08-24-2010 05:18 AM |