View Single Post
Old 01-18-2011, 01:48 AM   #32
cthrax
Junior Member
cthrax began at the beginning.
 
Posts: 1
Karma: 10
Join Date: Jan 2011
Device: kindle 3
I tweaked and tweaked and came out with a regex that get rid of all the abbyy stuff without losing the text they cleverly insert on some of those lines AND trims out some unnecessary new lines. It does have the side effect of removing bolding from Chapter headings and possibly moving those headings inline, but in most cases that's easy enough to correct.

Code:
(<a href="http://www.abbyy.com/buy"><b>[a-zA-Z\.0-9 ]{1,3}</b></a><br>)|<a href="http://www.abbyy.com/buy"><b>Click here to buy</b></a><br>|<a href="http://www.abbyy.com/buy"><b>PDF Transform</b></a><br>|<A.*?</a>|<a href="http://www.abbyy.com/buy">|</a>|</b>|<b>|\.?A ?B ?B ?Y ?Y ?\.c[ o]?m?|(?<=[^\.?!])(<br>)
If the newline stuff is unuseful to you or too greedy you can remove the last block
Code:
|(?<=[^\.?!])(<br>)
.

Hope this is helpful to someone else.
cthrax is offline   Reply With Quote