Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > Workshop

Notices

Reply
 
Thread Tools Search this Thread
Old 12-01-2012, 03:15 PM   #1
DSpider
Evangelist
DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.
 
DSpider's Avatar
 
Posts: 450
Karma: 343115
Join Date: Nov 2009
Location: Romania
Device: PW2 2014
LayoutPrep – a custom Word macro that preps your OCR content for styles

I wrote it for myself a few months ago, and I think it works very well with ABBYY FineReader OCR'ed content, because FineReader adds all sorts of styles to it (even styles for bolds and italics).

This macro basically breaks down the content, enclosing each character (characters that are either bold, italic, subscript, superscript, etc) with special characters that act like tags. Then it saves it as plain text (yes, that's right), and restores the formatting, based on those tags. This means that the text will be back the way it was, except it will be squeaky clean, without styles; the styles turn into direct formatting, ready (or "prepped", get it?) for you to scroll through the entire thing and apply Quick Styles in Word or InDesign, which will result in a much cleaner project.


It's very well commented. Take a look and see for yourself.

Link: http://pastebin.com/9TLg4UWG


Notes:
  • Footnotes are turned into in-line text.
  • If FineReader detects some of the text as either headers or footers, that text will not make it into the final document. So be careful when proofreading the text in FineReader (which you should anyway, because automatic processing is not quite there yet). I usually just Cut-Paste the ogrish green header into the body text if it was detected as a header.
  • This macro may or may not work on older versions of Microsoft Word. I have only tested it in Word 2010.

Enjoy! Feedback and suggestions are welcome.
DSpider is offline   Reply With Quote
Old 12-01-2012, 04:09 PM   #2
Toxaris
Wizard
Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.
 
Toxaris's Avatar
 
Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
If I can find it again, I will also post my version. That one does retain footnotes. I hope I stll have it. Computer crash...

** Found it, but it needs some tweaking. It is not quite as I want it.

Last edited by Toxaris; 12-02-2012 at 09:13 AM.
Toxaris is offline   Reply With Quote
Advert
Old 12-02-2012, 10:23 AM   #3
kbaerwald
BioReader
kbaerwald understands when you whisper 'The dog barks at midnight.'kbaerwald understands when you whisper 'The dog barks at midnight.'kbaerwald understands when you whisper 'The dog barks at midnight.'kbaerwald understands when you whisper 'The dog barks at midnight.'kbaerwald understands when you whisper 'The dog barks at midnight.'kbaerwald understands when you whisper 'The dog barks at midnight.'kbaerwald understands when you whisper 'The dog barks at midnight.'kbaerwald understands when you whisper 'The dog barks at midnight.'kbaerwald understands when you whisper 'The dog barks at midnight.'kbaerwald understands when you whisper 'The dog barks at midnight.'kbaerwald understands when you whisper 'The dog barks at midnight.'
 
kbaerwald's Avatar
 
Posts: 292
Karma: 42568
Join Date: Apr 2009
Location: Germany
Device: Various
DSpider - thank you for that fine piece of work. I am able now to get through my workflow much faster (eliminated):

- scan book into FineReader11
- unifying defined styles in FineReader with the style edit function
- arranging some "false" headers as chapter headings
- saving in to Word file (as editable copy)
- cleaning styles in the file within word
- Character editing with search & replace/regex
- chaptering
- saving to odt & reading into Jutoh
- making the ebook with footnotes, graphics and all bells & whistles

I applied the macro to several files which I scanned recently and it showed consistent results. Sometimes some "|" are left over but that can be detected quickly.

I am currently scanning my SF paperback library into epub and your macro helps a lot. I will keep you informed in case I find some glitches

Klaus
kbaerwald is offline   Reply With Quote
Old 12-02-2012, 01:53 PM   #4
Toxaris
Wizard
Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.
 
Toxaris's Avatar
 
Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
Found it. This will retain footnotes.

Code:
Private Sub cleancopy()
    Dim Style As Style
    Dim i As Integer

    Selection.WholeStory
    Selection.Cut

    For Each Style In ActiveDocument.Styles
        Style.Locked = True                           'alle stijlen blokkeren
    Next

    ActiveDocument.Styles(wdStyleNormal).Locked = False
    ActiveDocument.Protect Password:="password", NoReset:=False, Type:= _
                           wdNoProtection, UseIRM:=False, EnforceStyleLock:=True
    On Error Resume Next
    For i = ActiveDocument.Styles.count To 1 Step -1
        If ActiveDocument.Styles(i).Locked = True Then
            ActiveDocument.Styles(i).Delete
        End If
    Next i
    Selection.PasteAndFormat (wdFormatOriginalFormatting)
    ActiveDocument.Unprotect Password:="password"

End Sub
Toxaris is offline   Reply With Quote
Old 12-09-2012, 04:34 PM   #5
DSpider
Evangelist
DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.
 
DSpider's Avatar
 
Posts: 450
Karma: 343115
Join Date: Nov 2009
Location: Romania
Device: PW2 2014
Any idea how to split it into a bunch of 50 page documents? Because it's progressively slower after 200 pages or so, and processing a 400 page book takes a really long time. The task manager constantly shows that WINWORD.EXE is using 97-99% CPU. Imagine processing a 400 page paragraph... Because that's what it basically does - it breaks the entire document into a string of text, then reassembles it.
DSpider is offline   Reply With Quote
Advert
Old 12-10-2012, 01:24 AM   #6
kbaerwald
BioReader
kbaerwald understands when you whisper 'The dog barks at midnight.'kbaerwald understands when you whisper 'The dog barks at midnight.'kbaerwald understands when you whisper 'The dog barks at midnight.'kbaerwald understands when you whisper 'The dog barks at midnight.'kbaerwald understands when you whisper 'The dog barks at midnight.'kbaerwald understands when you whisper 'The dog barks at midnight.'kbaerwald understands when you whisper 'The dog barks at midnight.'kbaerwald understands when you whisper 'The dog barks at midnight.'kbaerwald understands when you whisper 'The dog barks at midnight.'kbaerwald understands when you whisper 'The dog barks at midnight.'kbaerwald understands when you whisper 'The dog barks at midnight.'
 
kbaerwald's Avatar
 
Posts: 292
Karma: 42568
Join Date: Apr 2009
Location: Germany
Device: Various
Yes - I noticed some slowness. For 600 pages it takes 10 mins on my i7 PC. Compared to the saved >60 mins in other parts of my workflow it is still a bargain ;-).

Could you possibly build in a loop that counts pages inside Word? The macro would finish the first 100 pages then works on the next 100 and so on.
kbaerwald is offline   Reply With Quote
Old 03-28-2013, 05:52 AM   #7
Toxaris
Wizard
Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.
 
Toxaris's Avatar
 
Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
A bit late, but you could try using ranges instead of normal searches. They are usually faster.

You might also skip a few steps, like replacing the line breaks with a code. You can extend the selection with the moveuntil combined with cset. In that case you can ensure you also take linebreaks and alike within your selection.
Toxaris is offline   Reply With Quote
Reply

Tags
ebook tools


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Word macro for clean HTML code Toxaris ePub 135 02-28-2015 02:21 AM
Word Macro: Footnotes to inline text ? Hadrien Workshop 17 10-08-2011 01:28 PM
Word 2007 macro scoplar Workshop 1 05-21-2011 05:00 AM
Doc to REB Macro for Word kbirdz Fictionwise eBookwise 0 10-27-2010 10:30 PM
Word Formatting Macro (Stingo's Macro) Stingo Sony Reader 75 08-24-2010 05:18 AM


All times are GMT -4. The time now is 12:49 PM.


MobileRead.com is a privately owned, operated and funded community.