05-26-2019, 05:22 PM | #1 |
Enthusiast
Posts: 27
Karma: 10
Join Date: Dec 2016
Location: Groningen, Netherlands
Device: Calibre, Kobo Aura H2O
|
Unwanted spaces between letters of a word
When converting pdf files through Calibre it sometimes happens that some words receive unwanted spaces between the letters of the words. For instance, the word Although becomes A l t h o u g h in the conversion, everything becomes e v e r y t h i n g, BATTLE becomes B A T T L E, formed a triad of becomes f o r m e d a triad of, etcetera. In some books that I converted thousands of words become messed-up like this, making it futile to try to correct the errors. Is it possible to prevent Calibre from wrecking words this way?
Most probably the problem originates from the fact that in the original pdf file the text is aligned justified. This has the effect that on some lines the letter-spacing becomes somewhat wider. Nevertheless the word-spacing always remains wider than the letter-spacing. Is it possible to define a minimum value for word-spacing in Calibre so that it will not add unwanted spaces within words when the letter-spacing is smaller than the word-spacing value? I have not found this option in Calibre, but I think something like this could be the solution for this problem. Any help will be greatly appreciated. Last edited by abecedarian; 05-27-2019 at 06:08 AM. |
05-26-2019, 09:43 PM | #2 |
creator of calibre
Posts: 43,860
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
The initial conversion of text is performed by pdftohtml from the poppler project, not calibre. calibre converts the HTML output by that tool. As far as I know there is no such knob for pdftohtml.
|
Advert | |
|
05-27-2019, 04:58 AM | #3 |
Enthusiast
Posts: 27
Karma: 10
Join Date: Dec 2016
Location: Groningen, Netherlands
Device: Calibre, Kobo Aura H2O
|
Thanks for the reply, Kovid. You always seem to be in the front line . The Poppler Project gives me the impression of being a rather inaccessable organization, though. But I'll try to get in touch with them about this topic. To me this problem looks like a major flaw of their conversion tool. I'm not an expert at regex, but I guess a regex search and replace will probably need replacing one at a time in stead of replacing all at once. So this will probably be too time-consuming to be of any practical use. I could not set up a regex expression that worked. And besides, it would also involve the intelligent use of linguistic rules to decide where word separations are to be kept in places where there are single empty spaces between letters. Something like AI.
Last edited by abecedarian; 05-27-2019 at 06:02 AM. |
Tags |
calibre, conversion, letter-spacing, unwanted free spaces, word-spacing |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
How to remove blank spaces in a word with RegEx? | RbnJrg | Sigil | 12 | 12-19-2018 06:58 AM |
Random spaces between letters | pjfarr | Calibre | 3 | 02-15-2015 10:01 AM |
HTML to ePub creating unwanted white spaces | newbie35 | Conversion | 2 | 02-11-2012 03:38 PM |
[Old Thread] PDF to Epub conversion (spaces between letters) | mastroalex | Conversion | 8 | 10-09-2011 10:39 PM |
PRS-650 Setting for highlighting whole word not just letters | diddy | Sony Reader | 9 | 03-04-2011 07:41 AM |