Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 12-30-2010, 11:16 PM   #31
Wolfaar
Junior Member
Wolfaar began at the beginning.
 
Posts: 1
Karma: 10
Join Date: Dec 2010
Device: Nook
I wrestled with this for a bit too, this is what worked for me, hope it helps some ppl. Oh and this may not be the easiest way to do it but its the one I use.

All this applies to windows Vista, newest rev Calibre, newest rev Notepad++, on a book file already converted to epub from a pdf.


The way I do it is install Notepad++, an editing program. It does way more than I know what to do with. Once installed,

1 in Calibre select the book you want to fix
2 right click and select 'tweak epub'
3 in the box that pops up click 'explode epub'
4 in the file browser that pops up now select the html files that start with 'index_split'
5 right click on the selected files and open them in Notepad++
6 All the files will now be open in separate tabs in Notepad++
7 select a files' tab
8 on the toolbar click the 'Find" tool button
9 in the box that pops up, in the find tab, check the 'Mark lines' box
10 in the 'Find what' field type "abbyy"
11 click the button marked 'Find all'
12 all lines with the abbyy junk in them will now be marked with a blue dot by the line number
13 now go to the 'Search' menu, go down to 'bookmark', and click 'delete bookmarked lines'.

that file should now be clean of the abbyy stuff now.

14 pick the next files' tab and repeat steps 8 thtough 13 for each tab
15 when all the open files are fixed go to 'File'>'Save all' then 'File'>'Close all'
16 Close Notepad++, close file browser window, and click 'rebuild epub' in Calibre pop-up dialog box

Thats it

there may be some more editing to do, some files had an extra character inserted into lines to make it harder to fix but a simple find and replace fixes that.
Wolfaar is offline   Reply With Quote
Old 01-18-2011, 01:48 AM   #32
cthrax
Junior Member
cthrax began at the beginning.
 
Posts: 1
Karma: 10
Join Date: Jan 2011
Device: kindle 3
I tweaked and tweaked and came out with a regex that get rid of all the abbyy stuff without losing the text they cleverly insert on some of those lines AND trims out some unnecessary new lines. It does have the side effect of removing bolding from Chapter headings and possibly moving those headings inline, but in most cases that's easy enough to correct.

Code:
(<a href="http://www.abbyy.com/buy"><b>[a-zA-Z\.0-9 ]{1,3}</b></a><br>)|<a href="http://www.abbyy.com/buy"><b>Click here to buy</b></a><br>|<a href="http://www.abbyy.com/buy"><b>PDF Transform</b></a><br>|<A.*?</a>|<a href="http://www.abbyy.com/buy">|</a>|</b>|<b>|\.?A ?B ?B ?Y ?Y ?\.c[ o]?m?|(?<=[^\.?!])(<br>)
If the newline stuff is unuseful to you or too greedy you can remove the last block
Code:
|(?<=[^\.?!])(<br>)
.

Hope this is helpful to someone else.
cthrax is offline   Reply With Quote
Advert
Old 02-01-2011, 06:41 PM   #33
JavaGrrl
Junior Member
JavaGrrl began at the beginning.
 
Posts: 1
Karma: 10
Join Date: Feb 2011
Device: color nook
Thanks, Cthrax!!

It's perfect!!!
JavaGrrl is offline   Reply With Quote
Old 09-09-2011, 12:12 AM   #34
DoctorT
Junior Member
DoctorT began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Sep 2011
Device: none
Remove ABBYY and Keep Bold Headings

Quote:
Originally Posted by cthrax View Post
I tweaked and tweaked and came out with a regex that get rid of all the abbyy stuff...
You can retain the bold chapter headers (and other bolded text lines) by using the following three Search & Replace items:

1. Search Expression: <b>(.+)</b><br>
Replacement text: @@\1@@

2. Search Expression (from cthrax):
Code:
(<a href="http://www.abbyy.com/buy"><b>[a-zA-Z\.0-9 ]{1,3}</b></a><br>)|<a href="http://www.abbyy.com/buy"><b>Click here to buy</b></a><br>|<a href="http://www.abbyy.com/buy"><b>PDF Transform</b></a><br>|<A.*?</a>|<a href="http://www.abbyy.com/buy">|</a>|</b>|<b>|\.?A ?B ?B ?Y ?Y ?\.c[ o]?m?
Replacement text:

3. Search Expression: @@(.+)@@
Replacement text: <b>\1</b><br>
DoctorT is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
removing unwanted pages ABBYY finereader sovre Workshop 3 08-04-2011 03:05 AM
Removing Header from .IMP ronin688 Fictionwise eBookwise 2 12-12-2010 07:36 PM
Removing a header pckopp Calibre 1 12-11-2010 01:33 PM
Removing header syntax. boromirofborg Calibre 0 07-21-2010 12:33 AM
PDF Conversion - Removing Header / Footer Text heb Sony Reader 9 07-11-2010 11:02 PM


All times are GMT -4. The time now is 06:57 AM.


MobileRead.com is a privately owned, operated and funded community.