View Single Post
Old 09-06-2010, 01:38 AM   #7
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
Calibre didn't do any OCR. This is a relatively common problem with conversion of pdf titles - the original PDF had extra spacing between the letters for formatting - not proper full spaces, but they are each basically separate draw instructions in the PDF, so they get converted as separate characters instead of a word. There is no easy way to get rid of those except by hand editing afterward.

As far as the title of the book appearing goes, that's because the PDF has a header or footer that's being converted as well. You can use the header/footer removal option in Structure detection to remove this. You need to write a regular expression pattern for this, it's probably something like "\s*<p>\s*Blink\s*</p>", but you'll need to use the test function to tweak the pattern.
ldolse is offline   Reply With Quote