Quote:
Originally Posted by Starson17
Calibre does not use OCR. Your PDF may have OCR text behind images of words. Calibre converts text not images of text, so if the text is wrong, Calibre's conversion is wrong. The text may have a single l, while the image of teh text has a double l. Or you may have ligatures. The engine Calibre uses can't handle all ligatures. Start by selecting the double-l word in your pdf, copy it, and paste it into notepad to see if that word pastes in correctly with a double or single l.
|
OK - thanks - my OCR hypothesis bites the dust - progress. moving on...
using acrobat, not my default pdfexch viewer, I can search for and exract text - here is the paragraph that I used at start of (mysterious case of diasappearing Ls ) thread; pasted from source PDF - the double Ls all look fine, yet see the posted epub conversion, below
Safely on the other side of the stairwell housing, Ruth tilted her head up and let
the cataract wash over her cataracts. She’d been scheduled to have
phacoemulsification the week after martial law was declared. Now she was stuck
with cloudy vision of a cloudy sky. She pulled some matted strands of hair away
from her eyes, her fingers straying up her forehead, which seemed to go all the
way to the back of her head.
I've added the italics here, for thread clarity. the above text looks fine if I post it into notepad.
calibre converts it to this
Safely on the other side of the stairwel housing, Ruth tilted her head up and let the cataract wash over her cataracts. She’d been scheduled to have phacoemulsification the week after martial law was declared. Now she was stuck with cloudy vision of a cloudy sky. She pul ed some matted strands of hair away from her eyes, her fingers straying up her forehead, which seemed to go al the way to the back of her head. Maybe it was better she couldn’t see that wel . In her mind she could stil picture herself as she was. Abe, too.
so, all ----al
pulled ----pul ed
still -----stil
stairwell ----stairwel