|
|||||||
![]() |
|
|
Thread Tools | Search this Thread |
|
|
#1 |
|
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 368
Karma: 298951
Join Date: Nov 2009
Location: Romania
Device: iPod touch 2G (16 GB)
|
LayoutPrep – a custom Word macro that preps your OCR content for styles
This macro basically breaks down the content, enclosing each character (characters that are either bold, italic, subscript, superscript, etc) with special characters that act like tags. Then it saves it as plain text (yes, that's right), and restores the formatting, based on those tags. This means that the text will be back the way it was, except it will be squeaky clean, without styles; the styles turn into direct formatting, ready (or "prepped", get it?) for you to scroll through the entire thing and apply Quick Styles in Word or InDesign, which will result in a much cleaner project. It's very well commented. Take a look and see for yourself. Link: http://pastebin.com/9TLg4UWG Notes:
Enjoy! Feedback and suggestions are welcome. |
|
|
|
|
|
#2 |
|
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,098
Karma: 927511
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-300, PRS-T1
|
If I can find it again, I will also post my version. That one does retain footnotes. I hope I stll have it. Computer crash...
** Found it, but it needs some tweaking. It is not quite as I want it.
__________________
Creator and maintainer of the e-Book Tools Word add-in. Creator and maintainer of the Clean HTML macro for MS Word. Last edited by Toxaris; 12-02-2012 at 09:13 AM. |
|
|
|
|
Enthusiast
|
|
|
|
#3 |
|
Der Leser
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 240
Karma: 12602
Join Date: Apr 2009
Location: Germany
Device: Diverse
|
DSpider - thank you for that fine piece of work. I am able now to get through my workflow much faster (eliminated):
- scan book into FineReader11 - unifying defined styles in FineReader with the style edit function - arranging some "false" headers as chapter headings - saving in to Word file (as editable copy) - cleaning styles in the file within word - Character editing with search & replace/regex - chaptering - saving to odt & reading into Jutoh - making the ebook with footnotes, graphics and all bells & whistles I applied the macro to several files which I scanned recently and it showed consistent results. Sometimes some "|" are left over but that can be detected quickly. I am currently scanning my SF paperback library into epub and your macro helps a lot. I will keep you informed in case I find some glitches ![]() Klaus |
|
|
|
|
|
#4 |
|
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,098
Karma: 927511
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-300, PRS-T1
|
Found it. This will retain footnotes.
Code:
Private Sub cleancopy()
Dim Style As Style
Dim i As Integer
Selection.WholeStory
Selection.Cut
For Each Style In ActiveDocument.Styles
Style.Locked = True 'alle stijlen blokkeren
Next
ActiveDocument.Styles(wdStyleNormal).Locked = False
ActiveDocument.Protect Password:="password", NoReset:=False, Type:= _
wdNoProtection, UseIRM:=False, EnforceStyleLock:=True
On Error Resume Next
For i = ActiveDocument.Styles.count To 1 Step -1
If ActiveDocument.Styles(i).Locked = True Then
ActiveDocument.Styles(i).Delete
End If
Next i
Selection.PasteAndFormat (wdFormatOriginalFormatting)
ActiveDocument.Unprotect Password:="password"
End Sub
__________________
Creator and maintainer of the e-Book Tools Word add-in. Creator and maintainer of the Clean HTML macro for MS Word. |
|
|
|
|
|
#5 |
|
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 368
Karma: 298951
Join Date: Nov 2009
Location: Romania
Device: iPod touch 2G (16 GB)
|
Any idea how to split it into a bunch of 50 page documents? Because it's progressively slower after 200 pages or so, and processing a 400 page book takes a really long time. The task manager constantly shows that WINWORD.EXE is using 97-99% CPU. Imagine processing a 400 page paragraph... Because that's what it basically does - it breaks the entire document into a string of text, then reassembles it.
|
|
|
|
|
|
#6 |
|
Der Leser
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 240
Karma: 12602
Join Date: Apr 2009
Location: Germany
Device: Diverse
|
Yes - I noticed some slowness. For 600 pages it takes 10 mins on my i7 PC. Compared to the saved >60 mins in other parts of my workflow it is still a bargain ;-).
Could you possibly build in a loop that counts pages inside Word? The macro would finish the first 100 pages then works on the next 100 and so on. |
|
|
|
|
|
#7 |
|
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,098
Karma: 927511
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-300, PRS-T1
|
A bit late, but you could try using ranges instead of normal searches. They are usually faster.
You might also skip a few steps, like replacing the line breaks with a code. You can extend the selection with the moveuntil combined with cset. In that case you can ensure you also take linebreaks and alike within your selection.
__________________
Creator and maintainer of the e-Book Tools Word add-in. Creator and maintainer of the Clean HTML macro for MS Word. |
|
|
|
![]() |
| Tags |
| ebook tools |
| Thread Tools | Search this Thread |
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Word macro for clean HTML code | Toxaris | ePub | 110 | 05-13-2013 04:52 PM |
| Word Macro: Footnotes to inline text ? | Hadrien | Workshop | 17 | 10-08-2011 01:28 PM |
| Word 2007 macro | scoplar | Workshop | 1 | 05-21-2011 05:00 AM |
| Doc to REB Macro for Word | kbirdz | Fictionwise eBookwise | 0 | 10-27-2010 10:30 PM |
| Word Formatting Macro (Stingo's Macro) | Stingo | Sony Reader | 75 | 08-24-2010 05:18 AM |