| 
			
			 | 
		#1 | 
| 
			
			
			
			 Member 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 15 
				Karma: 12678 
				Join Date: Apr 2013 
				
				
				
				Device: none 
				
				
				 | 
	
	
	
		
		
			
			 
				
				convert hyphens to em dashes... possible?
			 
			
			
			I'm reading a book right now that has some formatting issues. All of the dashes are simply hyphens, whether they should be hyphens or em-dashes or en-dashes. They are coded as hyphens. I confirmed this using Calibre's editor. 
		
	
		
		
		
		
		
		
		
		
		
		
	
	Is there any way to convert hyphens to em-dashes automatically? I know I can convert them all with a simple find and replace, but this will destroy any legitimate hyphens, like in compound words. I'm looking for something analogous to 'smartening' quotes, but for dashes. Is this even possible? I think the algorithm for 'smartening' quotes is fairly straight forward, but does such an algorithm exist for dashes? It seems like a small point, but I really do notice this when I am reading and it distracts me from the book. Proper em-dashes add meaning to a passage. If they appear as hyphens, it take me a moment to realize these are not compound words.  
		 | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#2 | 
| 
			
			
			
			 Ex-Helpdesk Junkie 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 19,421 
				Karma: 85400180 
				Join Date: Nov 2012 
				Location: The Beaten Path, USA, Roundworld, This Side of Infinity 
				
				
				Device: Kindle Touch fw5.3.7 (Wifi only) 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			Smarten punctuation will recognize a double hyphen and convert it to an emdash. It doesn't just operate on quotes. 
		
	
		
		
		
		
		
		
		
		
		
		
	
	You can always do the same with a find and replace, however, if there isn't a double hyphen there is no replacement for verifying manually whether it is truly an emdash. Similarly, the quotes fixing depends on patterns in the sentence, namely, whether there is a space before or after the quote mark. With additional rules for some special cases.  | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| Advert | |
| 
         | 
    
| 
			
			 | 
		#3 | 
| 
			
			
			
			 null operator (he/him) 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 22,018 
				Karma: 30277294 
				Join Date: Mar 2012 
				Location: Sydney Australia 
				
				
				Device: none 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			@jlocicero - I'm wondering if Sigil's Spell Check might be of some use - you could filter by spelling mistakes containing a 'hyphen' 
		
	
		
		
		
		
		
		
		
		
		
		
	
	BR  | 
| 
		
 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#4 | |
| 
			
			
			
			 Wizard 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,306 
				Karma: 13057279 
				Join Date: Jul 2012 
				
				
				
				Device: Kobo Forma, Nook 
				
				
				 | 
	
	
	
		
		
		
		
		 Quote: 
	
 Just add in a hyphen in the Filter box, and make sure "Show All Words" is checked. I use this all the time to remove accidental hard hyphens leftover from OCR. I typically do a "two pass" check. Once with "Show All Words" unchecked, and one with "Show All Words" checked. To replace hyphens with en dashes, I use this Regex: Search: ([0-9])-([0-9]) Replace: \1–\2 This handles all of the years/page numbers that are typically in the book (although I don't recommend using "replace all", replace on a case-by-case basis even though it will take a while longer). If you want to get even more refined.... there is no solid way to do it besides checking every single hyphen manually. Probably better to pull the information from a better source, or reOCR the thing yourself and do a code comparison.  | 
|
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#5 | |
| 
			
			
			
			 null operator (he/him) 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 22,018 
				Karma: 30277294 
				Join Date: Mar 2012 
				Location: Sydney Australia 
				
				
				Device: none 
				
				
				 | 
	
	
	
		
		
		
		
		 Quote: 
	
 I really like it, I recall seeing the presentation of misspellings in a similar list arrangement like Sigil's once before, in an add-in for Lotus Notes—loud groans are welcome—I find it much better than the more often used in line highlighting. I hope Kovid adopts a similar presentation. I suggest the ability to copy the incorrect spelling to the Change Selected Word To: text box be added, maybe via the word list context menu. So that one could edit it there, useful when proper names have incorrect or inconsistent (pet peeve) spelling. BR  | 
|
| 
		
 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| Advert | |
| 
         | 
    
| 
			
			 | 
		#6 | 
| 
			
			
			
			 Color me gone 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,089 
				Karma: 1445295 
				Join Date: Apr 2008 
				Location: Central Oregon Coast 
				
				
				Device: PRS-300 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			Sigil's spell check is very good, based on how I use it.  It allows sweeping up many mistakes at once and also allows you to triage which to fix first based on how often it appears.
		 
		
	
		
		
		
		
		
		
		
		
		
		
	
	 | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#7 | ||
| 
			
			
			
			 Wizard 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,306 
				Karma: 13057279 
				Join Date: Jul 2012 
				
				
				
				Device: Kobo Forma, Nook 
				
				
				 | 
	
	
	
		
		
		
		
		 Quote: 
	
 ![]() Before that, I was using the Index Tool to generate a word list, and using the filter in that to catch hyphenated words. Quite convoluted, but it worked better than anything else I had run across! The Spell Check Word List was something I had in the back of my mind for YEARS as a very useful tool, but never saw it used anywhere in my life. It is also one of those "killer features" of Sigil that makes it indispensable for me. It is the best tool for catching/fixing hyphens, and it is also useful having a "Word Count" of spellings. I can easily see the words and how many times they occur. It is fantastic for catching: 
 It also is extremely helpful that you can sort by Case Sensitive or Case Insensitive. And also extremely helpful that you can toggle just a list of Mispelled Words, OR, a list of all words. There was also this EPUB Spell Checker tool that came out back in September 2013: https://www.mobileread.com/forums/sho....php?p=2667112 I recommended a few things in Post #9 + #10. I also have my own ideas for my own custom tools... Although I have yet to get around to programming them (always getting delayed by other projects, and converting many more books). Quote: 
	
 Last edited by Tex2002ans; 04-01-2014 at 09:39 PM.  | 
||
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#8 | 
| 
			
			
			
			 Member 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 15 
				Karma: 12678 
				Join Date: Apr 2013 
				
				
				
				Device: none 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			Wow, thank you all! Since my source has no extra data (-- for en or em dash) I thought it was beyond saving. And searching for naked hyphens to fix in context would have been extremely time consuming. Sigil's spell check looks really helpful. 
		
	
		
		
		
		
		
		
		
		
		
		
	
	Thanks again! And speaking of hyphens, I guess I should say "en-dash" and "em-dash" instead of "en dash" and "em dash"...  | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
![]()  | 
            
        
    
            
  | 
    
			 
			Similar Threads
		 | 
	||||
| Thread | Thread Starter | Forum | Replies | Last Post | 
| Creating spaces around hyphens (or dashes). | wallflowerface | Conversion | 4 | 01-04-2014 07:42 PM | 
| txt->epub removes hyphens / dashes / double minus-signs | Rizla | Conversion | 3 | 05-17-2013 01:09 PM | 
| Dashed Dashes -- Befuddled by EN and EM Dashes (Apple Pages to EPUB) | planewryter | Conversion | 1 | 07-22-2012 10:52 PM | 
| Fixing hyphens and dashes with regular expressions | DoctorT | Conversion | 1 | 10-04-2011 11:46 PM | 
| Stripping out dashes on epub convert? | toddos | Calibre | 5 | 08-01-2010 04:29 PM |