| 
			
			 | 
		#1 | 
| 
			
			
			
			 Enthusiast 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 31 
				Karma: 25920 
				Join Date: Oct 2020 
				
				
				
				Device: Kobo Aura H2O (mark 5) 
				
				
				 | 
	
	
	
		
		
			
			 
				
				How to fix missing spaces between dictionary words
			 
			![]() I have this book where many thousands of spaces are missing between words. For instance: hehad, thereis, ina, withme, hecame, theworld, ... In the standard editor (or in some plug-in perhaps), is there a way to use a regex function that can (1) find words that are not present in the standard dictionary, but that consist of two words that are present in the dic, and (2) propose a change? I have read the doc on Function mode for Search & replace in the Editor, but I don't immediately see how this could be done. I'm sure someone must have had this problem before... Tia for any tips. Last edited by DVdm; 03-10-2023 at 11:55 AM. Reason: (grammar)  | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#2 | 
| 
			
			
			
			 Well trained by Cats 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 31,267 
				Karma: 61916422 
				Join Date: Aug 2009 
				Location: The Central Coast of California 
				
				
				Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			The editor finds misspelled words and proposes changes. (tick show only....) 
		
	
		
		
		
		
		
		
		
		
		
		
	
	One of those are to split into 2 words (not 100%, but 2 valid words) Note: This does not find <tagged> runtogethers  | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#3 | |
| 
			
			
			
			 Enthusiast 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 31 
				Karma: 25920 
				Join Date: Oct 2020 
				
				
				
				Device: Kobo Aura H2O (mark 5) 
				
				
				 | 
	
	
	
		
		
		
		
		 Quote: 
	
 I'd like to make all the changes in one blow, and then look for any induced mistakes.  | 
|
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#4 | 
| 
			
			
			
			 Enthusiast 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 31 
				Karma: 25920 
				Join Date: Oct 2020 
				
				
				
				Device: Kobo Aura H2O (mark 5) 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			Thanks. 
		
	
		
		
		
		
		
		
		
		
		
		
	
	I replied, but my message doesn't turn up.  | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#5 | 
| 
			
			
			
			 Wizard 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,688 
				Karma: 9500498 
				Join Date: Sep 2021 
				Location: Australia 
				
				
				Device: Kobo Libra 2 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			I have found this issue in quite a few books. Sometimes it looks like a previous editor has simply deleted all the hyphens in the book. 
		
	
		
		
		
		
		
		
		
		
		
		
	
	I use the "Check Spelling" function and simply scroll down the list of all mis-spelled words and fix them. Alt-F7 or Tools►Check spelling to access it. Its a lot easier than trying to check every page of the book.  | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#6 | 
| 
			
			
			
			 Wizard 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,090 
				Karma: 447222 
				Join Date: Jan 2009 
				Location: Valley Forge, PA, USA 
				
				
				Device: Kindle Paperwhite 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			There's a RegEx function called 'SplitWords' that tries to divide using the dictionary 
		
	
		
		
		
		
		
		
		
		
		
		
	
	https://www.mobileread.com/forums/sh...ht=split+words Post #9 by the master I've never used it so let me know how it goes Code: 
	
import regex
from calibre import replace_entities, prepare_string_for_xml
def replace(match, number, file_name, metadata, dictionaries, data, functions, *args, **kwargs):
    def fix_word(m):
        word = m.group()
        if dictionaries.recognized(word):
            return word
        for i in xrange(1, len(word) - 1):
            a, b = word[:i], word[i:]
            if dictionaries.recognized(a) and dictionaries.recognized(b):
                return a + ' ' + b
        return word
    text = replace_entities(match.group(1))
    text = regex.sub(r'\b\w+\b', fix_word, text, flags=regex.VERSION1)
    text = prepare_string_for_xml(text)
    return '>' + text + '<'
 | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#7 | |
| 
			
			
			
			 Enthusiast 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 31 
				Karma: 25920 
				Join Date: Oct 2020 
				
				
				
				Device: Kobo Aura H2O (mark 5) 
				
				
				 | 
	
	
	
		
		
		
		
		 Quote: 
	
 Code: 
	return '>' + text + '<' Code: 
	return text Code: 
	\b(\w+)\b(?![^<>{}]*[>}])
Code: 
	NameError: name 'xrange' is not defined Last edited by DVdm; 03-10-2023 at 07:08 PM. Reason: added used search string  | 
|
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#8 | 
| 
			
			
			
			 creator of calibre 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,609 
				Karma: 28549044 
				Join Date: Oct 2006 
				Location: Mumbai, India 
				
				
				Device: Various 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			change xrange to range in that function
		 
		
	
		
		
		
		
		
		
		
		
		
		
	
	 | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#9 | 
| 
			
			
			
			 Enthusiast 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 31 
				Karma: 25920 
				Join Date: Oct 2020 
				
				
				
				Device: Kobo Aura H2O (mark 5) 
				
				
				 | 
	
	|
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#10 | 
| 
			
			
			
			 Enthusiast 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 31 
				Karma: 25920 
				Join Date: Oct 2020 
				
				
				
				Device: Kobo Aura H2O (mark 5) 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			Using the regex function, it took me a few hours to fix the book, and then I realised something. 
		
	
		
		
		
		
		
		
		
		
		
		
		
			This was a book that I had found as a pdf, which I had converted to epub with Calibre. In an edit session I noticed that there were a bunch of paragraphs with some kind of hardcoded linefeeds. With a general search and replace, I replaced all linefeeds with nothing, effectively deleting them, and then did a global beautifying files. Stupid. I should have replaced the linefeeds with spaces. So I retrieved the original pfd from my backups, converted to epub, replaced all linefeeds with a space, and beautified all files. Ready. Silly me!  
		Last edited by DVdm; 03-12-2023 at 06:20 AM. Reason: spelling  | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
![]()  | 
            
        
    
            
  | 
    
			 
			Similar Threads
		 | 
	||||
| Thread | Thread Starter | Forum | Replies | Last Post | 
| Extra spaces between words | Drybonz | Conversion | 4 | 12-14-2015 09:15 PM | 
| How to make regex to replace 2 spaces between words, with one space? | crankypants | Sigil | 4 | 10-29-2015 12:51 PM | 
| Missing spaces between words | giwqnbha | Calibre | 2 | 10-18-2015 06:24 AM | 
| spaces introduced into middle of words in PDF conversion | paulrw | 1 | 11-06-2012 03:59 PM | |
| Troubleshooting can't make any spaces between words in my novel. | fantaxy | Amazon Kindle | 2 | 08-03-2011 11:38 AM |