|  03-10-2023, 10:51 AM | #1 | 
| Enthusiast            Posts: 31 Karma: 25920 Join Date: Oct 2020 Device: Kobo Aura H2O (mark 5) | 
				
				How to fix missing spaces between dictionary words
			  I have this book where many thousands of spaces are missing between words. For instance: hehad, thereis, ina, withme, hecame, theworld, ... In the standard editor (or in some plug-in perhaps), is there a way to use a regex function that can (1) find words that are not present in the standard dictionary, but that consist of two words that are present in the dic, and (2) propose a change? I have read the doc on Function mode for Search & replace in the Editor, but I don't immediately see how this could be done. I'm sure someone must have had this problem before... Tia for any tips. Last edited by DVdm; 03-10-2023 at 10:55 AM. Reason: (grammar) | 
|   |   | 
|  03-10-2023, 11:06 AM | #2 | 
| Well trained by Cats            Posts: 31,249 Karma: 61360164 Join Date: Aug 2009 Location: The Central Coast of California Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A | 
			
			The editor finds misspelled words and proposes changes. (tick show only....) One of those are to split into 2 words (not 100%, but 2 valid words) Note: This does not find <tagged> runtogethers | 
|   |   | 
| Advert | |
|  | 
|  03-10-2023, 11:34 AM | #3 | |
| Enthusiast            Posts: 31 Karma: 25920 Join Date: Oct 2020 Device: Kobo Aura H2O (mark 5) | Quote: 
 I'd like to make all the changes in one blow, and then look for any induced mistakes. | |
|   |   | 
|  03-10-2023, 12:50 PM | #4 | 
| Enthusiast            Posts: 31 Karma: 25920 Join Date: Oct 2020 Device: Kobo Aura H2O (mark 5) | 
			
			Thanks. I replied, but my message doesn't turn up. | 
|   |   | 
|  03-10-2023, 02:03 PM | #5 | 
| Wizard            Posts: 1,687 Karma: 9500498 Join Date: Sep 2021 Location: Australia Device: Kobo Libra 2 | 
			
			I have found this issue in quite a few books. Sometimes it looks like a previous editor has simply deleted all the hyphens in the book. I use the "Check Spelling" function and simply scroll down the list of all mis-spelled words and fix them. Alt-F7 or Tools►Check spelling to access it. Its a lot easier than trying to check every page of the book. | 
|   |   | 
| Advert | |
|  | 
|  03-10-2023, 02:18 PM | #6 | 
| Wizard            Posts: 1,090 Karma: 447222 Join Date: Jan 2009 Location: Valley Forge, PA, USA Device: Kindle Paperwhite | 
			
			There's a RegEx function called 'SplitWords' that tries to divide using the dictionary https://www.mobileread.com/forums/sh...ht=split+words Post #9 by the master I've never used it so let me know how it goes Code: 
import regex
from calibre import replace_entities, prepare_string_for_xml
def replace(match, number, file_name, metadata, dictionaries, data, functions, *args, **kwargs):
    def fix_word(m):
        word = m.group()
        if dictionaries.recognized(word):
            return word
        for i in xrange(1, len(word) - 1):
            a, b = word[:i], word[i:]
            if dictionaries.recognized(a) and dictionaries.recognized(b):
                return a + ' ' + b
        return word
    text = replace_entities(match.group(1))
    text = regex.sub(r'\b\w+\b', fix_word, text, flags=regex.VERSION1)
    text = prepare_string_for_xml(text)
    return '>' + text + '<' | 
|   |   | 
|  03-10-2023, 06:04 PM | #7 | |
| Enthusiast            Posts: 31 Karma: 25920 Join Date: Oct 2020 Device: Kobo Aura H2O (mark 5) | Quote: 
 Code: return '>' + text + '<' Code: return text Code: \b(\w+)\b(?![^<>{}]*[>}])Code: NameError: name 'xrange' is not defined Last edited by DVdm; 03-10-2023 at 06:08 PM. Reason: added used search string | |
|   |   | 
|  03-10-2023, 08:42 PM | #8 | 
| creator of calibre            Posts: 45,604 Karma: 28548974 Join Date: Oct 2006 Location: Mumbai, India Device: Various | 
			
			change xrange to range in that function
		 | 
|   |   | 
|  03-11-2023, 05:45 AM | #9 | 
| Enthusiast            Posts: 31 Karma: 25920 Join Date: Oct 2020 Device: Kobo Aura H2O (mark 5) | |
|   |   | 
|  03-11-2023, 05:26 PM | #10 | 
| Enthusiast            Posts: 31 Karma: 25920 Join Date: Oct 2020 Device: Kobo Aura H2O (mark 5) | 
			
			Using the regex function, it took me a few hours to fix the book, and then I realised something. This was a book that I had found as a pdf, which I had converted to epub with Calibre. In an edit session I noticed that there were a bunch of paragraphs with some kind of hardcoded linefeeds. With a general search and replace, I replaced all linefeeds with nothing, effectively deleting them, and then did a global beautifying files. Stupid. I should have replaced the linefeeds with spaces. So I retrieved the original pfd from my backups, converted to epub, replaced all linefeeds with a space, and beautified all files. Ready. Silly me!   Last edited by DVdm; 03-12-2023 at 05:20 AM. Reason: spelling | 
|   |   | 
|  | 
| Thread Tools | Search this Thread | 
| 
 | 
|  Similar Threads | ||||
| Thread | Thread Starter | Forum | Replies | Last Post | 
| Extra spaces between words | Drybonz | Conversion | 4 | 12-14-2015 08:15 PM | 
| How to make regex to replace 2 spaces between words, with one space? | crankypants | Sigil | 4 | 10-29-2015 11:51 AM | 
| Missing spaces between words | giwqnbha | Calibre | 2 | 10-18-2015 05:24 AM | 
| spaces introduced into middle of words in PDF conversion | paulrw | 1 | 11-06-2012 02:59 PM | |
| Troubleshooting can't make any spaces between words in my novel. | fantaxy | Amazon Kindle | 2 | 08-03-2011 10:38 AM |