Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 05-26-2019, 05:22 PM   #1
abecedarian
Enthusiast
abecedarian began at the beginning.
 
Posts: 27
Karma: 10
Join Date: Dec 2016
Location: Groningen, Netherlands
Device: Calibre, Kobo Aura H2O
Unwanted spaces between letters of a word

When converting pdf files through Calibre it sometimes happens that some words receive unwanted spaces between the letters of the words. For instance, the word Although becomes A l t h o u g h in the conversion, everything becomes e v e r y t h i n g, BATTLE becomes B A T T L E, formed a triad of becomes f o r m e d a triad of, etcetera. In some books that I converted thousands of words become messed-up like this, making it futile to try to correct the errors. Is it possible to prevent Calibre from wrecking words this way?

Most probably the problem originates from the fact that in the original pdf file the text is aligned justified. This has the effect that on some lines the letter-spacing becomes somewhat wider. Nevertheless the word-spacing always remains wider than the letter-spacing. Is it possible to define a minimum value for word-spacing in Calibre so that it will not add unwanted spaces within words when the letter-spacing is smaller than the word-spacing value? I have not found this option in Calibre, but I think something like this could be the solution for this problem.

Any help will be greatly appreciated.

Last edited by abecedarian; 05-27-2019 at 06:08 AM.
abecedarian is offline   Reply With Quote
Old 05-26-2019, 09:43 PM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,860
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
The initial conversion of text is performed by pdftohtml from the poppler project, not calibre. calibre converts the HTML output by that tool. As far as I know there is no such knob for pdftohtml.
kovidgoyal is online now   Reply With Quote
Advert
Old 05-27-2019, 04:58 AM   #3
abecedarian
Enthusiast
abecedarian began at the beginning.
 
Posts: 27
Karma: 10
Join Date: Dec 2016
Location: Groningen, Netherlands
Device: Calibre, Kobo Aura H2O
Thanks for the reply, Kovid. You always seem to be in the front line . The Poppler Project gives me the impression of being a rather inaccessable organization, though. But I'll try to get in touch with them about this topic. To me this problem looks like a major flaw of their conversion tool. I'm not an expert at regex, but I guess a regex search and replace will probably need replacing one at a time in stead of replacing all at once. So this will probably be too time-consuming to be of any practical use. I could not set up a regex expression that worked. And besides, it would also involve the intelligent use of linguistic rules to decide where word separations are to be kept in places where there are single empty spaces between letters. Something like AI.

Last edited by abecedarian; 05-27-2019 at 06:02 AM.
abecedarian is offline   Reply With Quote
Reply

Tags
calibre, conversion, letter-spacing, unwanted free spaces, word-spacing


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
How to remove blank spaces in a word with RegEx? RbnJrg Sigil 12 12-19-2018 06:58 AM
Random spaces between letters pjfarr Calibre 3 02-15-2015 10:01 AM
HTML to ePub creating unwanted white spaces newbie35 Conversion 2 02-11-2012 03:38 PM
[Old Thread] PDF to Epub conversion (spaces between letters) mastroalex Conversion 8 10-09-2011 10:39 PM
PRS-650 Setting for highlighting whole word not just letters diddy Sony Reader 9 03-04-2011 07:41 AM


All times are GMT -4. The time now is 01:20 PM.


MobileRead.com is a privately owned, operated and funded community.