Ok so a whole bunch of PDFs I have have this nasty watermark which gets converted to text every time as exactly this:
Code:
<a href="http://www.abbyy.com/buy"><b>PDF Transform</b></a><br>
<a href="http://www.abbyy.com/buy"><b>PDF Transform</b></a><br>
<a href="http://www.abbyy.com/buy"><b>Y</b></a><br>
<a href="http://www.abbyy.com/buy"><b>Y</b></a><br>
<a href="http://www.abbyy.com/buy"><b>Y</b></a><br>
<a href="http://www.abbyy.com/buy"><b>er</b></a><br>
<a href="http://www.abbyy.com/buy"><b>Y</b></a><br>
<a href="http://www.abbyy.com/buy"><b>er</b></a><br>
<a href="http://www.abbyy.com/buy"><b>B</b></a><br>
<a href="http://www.abbyy.com/buy"><b>2</b></a><br>
<a href="http://www.abbyy.com/buy"><b>B</b></a><br>
<a href="http://www.abbyy.com/buy"><b>2</b></a><br>
<a href="http://www.abbyy.com/buy"><b>B</b></a><br>
<a href="http://www.abbyy.com/buy"><b>.0</b></a><br>
<a href="http://www.abbyy.com/buy"><b>B</b></a><br>
<a href="http://www.abbyy.com/buy"><b>.0</b></a><br>
<a href="http://www.abbyy.com/buy"><b>A</b></a><br>
<a href="http://www.abbyy.com/buy"><b>A</b></a><br>
<a href="http://www.abbyy.com/buy"><b>Click here to buy</b></a><br>
<a href="http://www.abbyy.com/buy"><b>Click here to buy</b></a><br>
<a href="http://www.abbyy.com/buy"><b>w</b></a><br>
<a href="http://www.abbyy.com/buy"><b>w</b></a><br>
<a href="http://www.abbyy.com/buy"><b>w</b></a><br>
<a href="http://www.abbyy.com/buy"><b>w</b></a><br>
<a href="http://www.abbyy.com/buy"><b>w .</b></a><br>
<a href="http://www.abbyy.com/buy"><b>w</b></a><br>
<a href="http://www.abbyy.com/buy"><b>A B B YY.com</b></a><br>
<a href="http://www.abbyy.com/buy"><b>.A B BYY.com</b></a><br>
Annoying, right?
I know I can remove most of it during a bulk conversion using these two:
<a href="http://www.abbyy.com/buy"><b>[a-zA-Z0-9]</b></a><br>
and
<a href="http://www.abbyy.com/buy"><b>[a-zA-Z0-9][a-zA-Z0-9]</b></a><br>
but the longer ones will obviously remain in there. While reading thru the "all about using regular expressions in calibre" it got over my head and if anyone can help me with a regular expression to remove all of that junk I'd really appreciate it.