MobileRead Forums - View Single Post - PDF to .mobi

ldolse · 09-06-2010, 02:38 AM

Calibre didn't do any OCR. This is a relatively common problem with conversion of pdf titles - the original PDF had extra spacing between the letters for formatting - not proper full spaces, but they are each basically separate draw instructions in the PDF, so they get converted as separate characters instead of a word. There is no easy way to get rid of those except by hand editing afterward.

As far as the title of the book appearing goes, that's because the PDF has a header or footer that's being converted as well. You can use the header/footer removal option in Structure detection to remove this. You need to write a regular expression pattern for this, it's probably something like "\s*<p>\s*Blink\s*</p>", but you'll need to use the test function to tweak the pattern.

09-06-2010, 02:38 AM	#7
ldolse Wizard Posts: 1,337 Karma: 123457 Join Date: Apr 2009 Location: Malaysia Device: PRS-650, iPhone	Calibre didn't do any OCR. This is a relatively common problem with conversion of pdf titles - the original PDF had extra spacing between the letters for formatting - not proper full spaces, but they are each basically separate draw instructions in the PDF, so they get converted as separate characters instead of a word. There is no easy way to get rid of those except by hand editing afterward. As far as the title of the book appearing goes, that's because the PDF has a header or footer that's being converted as well. You can use the header/footer removal option in Structure detection to remove this. You need to write a regular expression pattern for this, it's probably something like "\s<p>\sBlink\s*</p>", but you'll need to use the test function to tweak the pattern.