Well, once again check the pdftohtml intermediate content. Use the Regex Builder wizard to make sure you match the right stuff.
There will be HTML, not just text. pdftohtml is a third-party utility that comes from poppler, and it should be predictable enough -- calibre performs the S&R
before stomping all over the markup with its CSS-flattening algorithm.
Normally the regex is applied to the raw contents of the input format, i.e. unzipped EPUB/AZW3 (X)HTML. But PDF is, ah, complicated, so it has to be turned into HTML before you can convert that HTML to something else.