12-30-2010, 11:16 PM | #31 |
Junior Member
Posts: 1
Karma: 10
Join Date: Dec 2010
Device: Nook
|
I wrestled with this for a bit too, this is what worked for me, hope it helps some ppl. Oh and this may not be the easiest way to do it but its the one I use.
All this applies to windows Vista, newest rev Calibre, newest rev Notepad++, on a book file already converted to epub from a pdf. The way I do it is install Notepad++, an editing program. It does way more than I know what to do with. Once installed, 1 in Calibre select the book you want to fix 2 right click and select 'tweak epub' 3 in the box that pops up click 'explode epub' 4 in the file browser that pops up now select the html files that start with 'index_split' 5 right click on the selected files and open them in Notepad++ 6 All the files will now be open in separate tabs in Notepad++ 7 select a files' tab 8 on the toolbar click the 'Find" tool button 9 in the box that pops up, in the find tab, check the 'Mark lines' box 10 in the 'Find what' field type "abbyy" 11 click the button marked 'Find all' 12 all lines with the abbyy junk in them will now be marked with a blue dot by the line number 13 now go to the 'Search' menu, go down to 'bookmark', and click 'delete bookmarked lines'. that file should now be clean of the abbyy stuff now. 14 pick the next files' tab and repeat steps 8 thtough 13 for each tab 15 when all the open files are fixed go to 'File'>'Save all' then 'File'>'Close all' 16 Close Notepad++, close file browser window, and click 'rebuild epub' in Calibre pop-up dialog box Thats it there may be some more editing to do, some files had an extra character inserted into lines to make it harder to fix but a simple find and replace fixes that. |
01-18-2011, 01:48 AM | #32 |
Junior Member
Posts: 1
Karma: 10
Join Date: Jan 2011
Device: kindle 3
|
I tweaked and tweaked and came out with a regex that get rid of all the abbyy stuff without losing the text they cleverly insert on some of those lines AND trims out some unnecessary new lines. It does have the side effect of removing bolding from Chapter headings and possibly moving those headings inline, but in most cases that's easy enough to correct.
Code:
(<a href="http://www.abbyy.com/buy"><b>[a-zA-Z\.0-9 ]{1,3}</b></a><br>)|<a href="http://www.abbyy.com/buy"><b>Click here to buy</b></a><br>|<a href="http://www.abbyy.com/buy"><b>PDF Transform</b></a><br>|<A.*?</a>|<a href="http://www.abbyy.com/buy">|</a>|</b>|<b>|\.?A ?B ?B ?Y ?Y ?\.c[ o]?m?|(?<=[^\.?!])(<br>) Code:
|(?<=[^\.?!])(<br>) Hope this is helpful to someone else. |
Advert | |
|
02-01-2011, 06:41 PM | #33 |
Junior Member
Posts: 1
Karma: 10
Join Date: Feb 2011
Device: color nook
|
Thanks, Cthrax!!
It's perfect!!!
|
09-09-2011, 12:12 AM | #34 | |
Junior Member
Posts: 4
Karma: 10
Join Date: Sep 2011
Device: none
|
Remove ABBYY and Keep Bold Headings
Quote:
1. Search Expression: <b>(.+)</b><br> Replacement text: @@\1@@ 2. Search Expression (from cthrax): Code:
(<a href="http://www.abbyy.com/buy"><b>[a-zA-Z\.0-9 ]{1,3}</b></a><br>)|<a href="http://www.abbyy.com/buy"><b>Click here to buy</b></a><br>|<a href="http://www.abbyy.com/buy"><b>PDF Transform</b></a><br>|<A.*?</a>|<a href="http://www.abbyy.com/buy">|</a>|</b>|<b>|\.?A ?B ?B ?Y ?Y ?\.c[ o]?m? 3. Search Expression: @@(.+)@@ Replacement text: <b>\1</b><br> |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
removing unwanted pages ABBYY finereader | sovre | Workshop | 3 | 08-04-2011 03:05 AM |
Removing Header from .IMP | ronin688 | Fictionwise eBookwise | 2 | 12-12-2010 07:36 PM |
Removing a header | pckopp | Calibre | 1 | 12-11-2010 01:33 PM |
Removing header syntax. | boromirofborg | Calibre | 0 | 07-21-2010 12:33 AM |
PDF Conversion - Removing Header / Footer Text | heb | Sony Reader | 9 | 07-11-2010 11:02 PM |