pdf with split words at end of line - how best to convert
attempting a pdf to epub ( & yes I know its a dumb thing to do ) but all goes well except where the original PDF has split a word over 2 lines- which happens a lot in this document
e.g. if PDF goes
line 1: xxxxxxxxxxxxxxxxxxxxxx al-
line 2: so xxxxxxxxxxxxxxxxx
then the epub comes out as " al‐ so"
but with the hyphen replaced by thick black bold? vertical line after the l of also NB it doesn't appear when I copy from epub reader & paste to here ), but I see it also in the source window when I open calibre wizard.
a text version of the source ( In notepad) shows
al-
so
i.e. there's a line break in there.
it must be to do with how a line break character in the PDF is being translated.
is there any way to remove / suppress it ?
update - I ticked the transliterate unicode box & recoverted zip to epub - that removed the thick black character so now I just see a broken word e.g. "al- so" .
is it possible to force an auto repair of all broken words somehow. it would be like a global replace of "- " with NULL but filtering out the genuine use of "-" characters - something like remove all "- " except when preceeded by a space ?
Last edited by cybmole; 10-17-2010 at 08:03 AM.
|