![]() |
#1 |
Member
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 15
Karma: 12678
Join Date: Apr 2013
Device: none
|
convert hyphens to em dashes... possible?
I'm reading a book right now that has some formatting issues. All of the dashes are simply hyphens, whether they should be hyphens or em-dashes or en-dashes. They are coded as hyphens. I confirmed this using Calibre's editor.
Is there any way to convert hyphens to em-dashes automatically? I know I can convert them all with a simple find and replace, but this will destroy any legitimate hyphens, like in compound words. I'm looking for something analogous to 'smartening' quotes, but for dashes. Is this even possible? I think the algorithm for 'smartening' quotes is fairly straight forward, but does such an algorithm exist for dashes? It seems like a small point, but I really do notice this when I am reading and it distracts me from the book. Proper em-dashes add meaning to a passage. If they appear as hyphens, it take me a moment to realize these are not compound words. ![]() |
![]() |
![]() |
![]() |
#2 |
Ex-Helpdesk Junkie
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 19,421
Karma: 85400180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
|
Smarten punctuation will recognize a double hyphen and convert it to an emdash. It doesn't just operate on quotes.
You can always do the same with a find and replace, however, if there isn't a double hyphen there is no replacement for verifying manually whether it is truly an emdash. Similarly, the quotes fixing depends on patterns in the sentence, namely, whether there is a space before or after the quote mark. With additional rules for some special cases. |
![]() |
![]() |
Advert | |
|
![]() |
#3 |
null operator (he/him)
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 21,718
Karma: 29711016
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
@jlocicero - I'm wondering if Sigil's Spell Check might be of some use - you could filter by spelling mistakes containing a 'hyphen'
BR |
![]() |
![]() |
![]() |
#4 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
Quote:
Just add in a hyphen in the Filter box, and make sure "Show All Words" is checked. I use this all the time to remove accidental hard hyphens leftover from OCR. I typically do a "two pass" check. Once with "Show All Words" unchecked, and one with "Show All Words" checked. To replace hyphens with en dashes, I use this Regex: Search: ([0-9])-([0-9]) Replace: \1–\2 This handles all of the years/page numbers that are typically in the book (although I don't recommend using "replace all", replace on a case-by-case basis even though it will take a while longer). If you want to get even more refined.... there is no solid way to do it besides checking every single hyphen manually. Probably better to pull the information from a better source, or reOCR the thing yourself and do a code comparison. |
|
![]() |
![]() |
![]() |
#5 | |
null operator (he/him)
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 21,718
Karma: 29711016
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
Quote:
I really like it, I recall seeing the presentation of misspellings in a similar list arrangement like Sigil's once before, in an add-in for Lotus Notes—loud groans are welcome—I find it much better than the more often used in line highlighting. I hope Kovid adopts a similar presentation. I suggest the ability to copy the incorrect spelling to the Change Selected Word To: text box be added, maybe via the word list context menu. So that one could edit it there, useful when proper names have incorrect or inconsistent (pet peeve) spelling. BR |
|
![]() |
![]() |
Advert | |
|
![]() |
#6 |
Color me gone
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,089
Karma: 1445295
Join Date: Apr 2008
Location: Central Oregon Coast
Device: PRS-300
|
Sigil's spell check is very good, based on how I use it. It allows sweeping up many mistakes at once and also allows you to triage which to fix first based on how often it appears.
|
![]() |
![]() |
![]() |
#7 | ||
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
Quote:
![]() Before that, I was using the Index Tool to generate a word list, and using the filter in that to catch hyphenated words. Quite convoluted, but it worked better than anything else I had run across! The Spell Check Word List was something I had in the back of my mind for YEARS as a very useful tool, but never saw it used anywhere in my life. It is also one of those "killer features" of Sigil that makes it indispensable for me. It is the best tool for catching/fixing hyphens, and it is also useful having a "Word Count" of spellings. I can easily see the words and how many times they occur. It is fantastic for catching:
It also is extremely helpful that you can sort by Case Sensitive or Case Insensitive. And also extremely helpful that you can toggle just a list of Mispelled Words, OR, a list of all words. There was also this EPUB Spell Checker tool that came out back in September 2013: https://www.mobileread.com/forums/sho....php?p=2667112 I recommended a few things in Post #9 + #10. I also have my own ideas for my own custom tools... Although I have yet to get around to programming them (always getting delayed by other projects, and converting many more books). Quote:
![]() Last edited by Tex2002ans; 04-01-2014 at 08:39 PM. |
||
![]() |
![]() |
![]() |
#8 |
Member
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 15
Karma: 12678
Join Date: Apr 2013
Device: none
|
Wow, thank you all! Since my source has no extra data (-- for en or em dash) I thought it was beyond saving. And searching for naked hyphens to fix in context would have been extremely time consuming. Sigil's spell check looks really helpful.
Thanks again! And speaking of hyphens, I guess I should say "en-dash" and "em-dash" instead of "en dash" and "em dash"... |
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Creating spaces around hyphens (or dashes). | wallflowerface | Conversion | 4 | 01-04-2014 06:42 PM |
txt->epub removes hyphens / dashes / double minus-signs | Rizla | Conversion | 3 | 05-17-2013 12:09 PM |
Dashed Dashes -- Befuddled by EN and EM Dashes (Apple Pages to EPUB) | planewryter | Conversion | 1 | 07-22-2012 09:52 PM |
Fixing hyphens and dashes with regular expressions | DoctorT | Conversion | 1 | 10-04-2011 10:46 PM |
Stripping out dashes on epub convert? | toddos | Calibre | 5 | 08-01-2010 03:29 PM |