View Single Post
Old 12-25-2011, 04:45 PM   #1
Flammy
Junior Member
Flammy began at the beginning.
 
Posts: 1
Karma: 10
Join Date: Dec 2011
Device: Kindle Touch
Could use a bit of help with regular expressions to edit books on conversion

Ok so a whole bunch of PDFs I have have this nasty watermark which gets converted to text every time as exactly this:

Code:
<a href="http://www.abbyy.com/buy"><b>PDF Transform</b></a><br>
<a href="http://www.abbyy.com/buy"><b>PDF Transform</b></a><br>
<a href="http://www.abbyy.com/buy"><b>Y</b></a><br>
<a href="http://www.abbyy.com/buy"><b>Y</b></a><br>
<a href="http://www.abbyy.com/buy"><b>Y</b></a><br>
<a href="http://www.abbyy.com/buy"><b>er</b></a><br>
<a href="http://www.abbyy.com/buy"><b>Y</b></a><br>
<a href="http://www.abbyy.com/buy"><b>er</b></a><br>
<a href="http://www.abbyy.com/buy"><b>B</b></a><br>
<a href="http://www.abbyy.com/buy"><b>2</b></a><br>
<a href="http://www.abbyy.com/buy"><b>B</b></a><br>
<a href="http://www.abbyy.com/buy"><b>2</b></a><br>
<a href="http://www.abbyy.com/buy"><b>B</b></a><br>
<a href="http://www.abbyy.com/buy"><b>.0</b></a><br>
<a href="http://www.abbyy.com/buy"><b>B</b></a><br>
<a href="http://www.abbyy.com/buy"><b>.0</b></a><br>
<a href="http://www.abbyy.com/buy"><b>A</b></a><br>
<a href="http://www.abbyy.com/buy"><b>A</b></a><br>
<a href="http://www.abbyy.com/buy"><b>Click here to buy</b></a><br>
<a href="http://www.abbyy.com/buy"><b>Click here to buy</b></a><br>
<a href="http://www.abbyy.com/buy"><b>w</b></a><br>
<a href="http://www.abbyy.com/buy"><b>w</b></a><br>
<a href="http://www.abbyy.com/buy"><b>w</b></a><br>
<a href="http://www.abbyy.com/buy"><b>w</b></a><br>
<a href="http://www.abbyy.com/buy"><b>w .</b></a><br>
<a href="http://www.abbyy.com/buy"><b>w</b></a><br>
<a href="http://www.abbyy.com/buy"><b>A B B YY.com</b></a><br>
<a href="http://www.abbyy.com/buy"><b>.A B BYY.com</b></a><br>
Annoying, right?

I know I can remove most of it during a bulk conversion using these two:

<a href="http://www.abbyy.com/buy"><b>[a-zA-Z0-9]</b></a><br>

and

<a href="http://www.abbyy.com/buy"><b>[a-zA-Z0-9][a-zA-Z0-9]</b></a><br>

but the longer ones will obviously remain in there. While reading thru the "all about using regular expressions in calibre" it got over my head and if anyone can help me with a regular expression to remove all of that junk I'd really appreciate it.
Flammy is offline   Reply With Quote