MobileRead Forums - View Single Post - Getting rid of the footer text and page count

eschwartz · 12-14-2015, 03:28 PM

Well, once again check the pdftohtml intermediate content. Use the Regex Builder wizard to make sure you match the right stuff.

There will be HTML, not just text. pdftohtml is a third-party utility that comes from poppler, and it should be predictable enough -- calibre performs the S&R before stomping all over the markup with its CSS-flattening algorithm.

Normally the regex is applied to the raw contents of the input format, i.e. unzipped EPUB/AZW3 (X)HTML. But PDF is, ah, complicated, so it has to be turned into HTML before you can convert that HTML to something else.

12-14-2015, 03:28 PM	#4
eschwartz Ex-Helpdesk Junkie Posts: 19,421 Karma: 85400180 Join Date: Nov 2012 Location: The Beaten Path, USA, Roundworld, This Side of Infinity Device: Kindle Touch fw5.3.7 (Wifi only)	Well, once again check the pdftohtml intermediate content. Use the Regex Builder wizard to make sure you match the right stuff. There will be HTML, not just text. pdftohtml is a third-party utility that comes from poppler, and it should be predictable enough -- calibre performs the S&R before stomping all over the markup with its CSS-flattening algorithm. Normally the regex is applied to the raw contents of the input format, i.e. unzipped EPUB/AZW3 (X)HTML. But PDF is, ah, complicated, so it has to be turned into HTML before you can convert that HTML to something else. Last edited by eschwartz; 12-14-2015 at 03:31 PM.