View Single Post
Old 10-17-2010, 07:54 AM   #1
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
pdf with split words at end of line - how best to convert

attempting a pdf to epub ( & yes I know its a dumb thing to do ) but all goes well except where the original PDF has split a word over 2 lines- which happens a lot in this document

e.g. if PDF goes
line 1: xxxxxxxxxxxxxxxxxxxxxx al-
line 2: so xxxxxxxxxxxxxxxxx

then the epub comes out as " al‐ so"
but with the hyphen replaced by thick black bold? vertical line after the l of also NB it doesn't appear when I copy from epub reader & paste to here ), but I see it also in the source window when I open calibre wizard.

a text version of the source ( In notepad) shows
al-
so

i.e. there's a line break in there.

it must be to do with how a line break character in the PDF is being translated.

is there any way to remove / suppress it ?

update - I ticked the transliterate unicode box & recoverted zip to epub - that removed the thick black character so now I just see a broken word e.g. "al- so" .

is it possible to force an auto repair of all broken words somehow. it would be like a global replace of "- " with NULL but filtering out the genuine use of "-" characters - something like remove all "- " except when preceeded by a space ?

Last edited by cybmole; 10-17-2010 at 08:03 AM.
cybmole is offline   Reply With Quote