Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Editor

Notices

Reply
 
Thread Tools Search this Thread
Old 03-31-2014, 04:01 PM   #1
jlocicero
Member
jlocicero can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterjlocicero can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterjlocicero can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterjlocicero can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterjlocicero can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterjlocicero can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterjlocicero can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterjlocicero can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterjlocicero can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterjlocicero can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterjlocicero can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameter
 
Posts: 15
Karma: 12678
Join Date: Apr 2013
Device: none
convert hyphens to em dashes... possible?

I'm reading a book right now that has some formatting issues. All of the dashes are simply hyphens, whether they should be hyphens or em-dashes or en-dashes. They are coded as hyphens. I confirmed this using Calibre's editor.

Is there any way to convert hyphens to em-dashes automatically? I know I can convert them all with a simple find and replace, but this will destroy any legitimate hyphens, like in compound words. I'm looking for something analogous to 'smartening' quotes, but for dashes.

Is this even possible? I think the algorithm for 'smartening' quotes is fairly straight forward, but does such an algorithm exist for dashes?

It seems like a small point, but I really do notice this when I am reading and it distracts me from the book. Proper em-dashes add meaning to a passage. If they appear as hyphens, it take me a moment to realize these are not compound words.
jlocicero is offline   Reply With Quote
Old 03-31-2014, 10:50 PM   #2
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,421
Karma: 85400180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
Smarten punctuation will recognize a double hyphen and convert it to an emdash. It doesn't just operate on quotes.

You can always do the same with a find and replace, however, if there isn't a double hyphen there is no replacement for verifying manually whether it is truly an emdash.

Similarly, the quotes fixing depends on patterns in the sentence, namely, whether there is a space before or after the quote mark. With additional rules for some special cases.
eschwartz is offline   Reply With Quote
Advert
Old 03-31-2014, 11:56 PM   #3
BetterRed
null operator (he/him)
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 21,718
Karma: 29711016
Join Date: Mar 2012
Location: Sydney Australia
Device: none
@jlocicero - I'm wondering if Sigil's Spell Check might be of some use - you could filter by spelling mistakes containing a 'hyphen'

BR
BetterRed is offline   Reply With Quote
Old 04-01-2014, 03:11 AM   #4
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by BetterRed View Post
@jlocicero - I'm wondering if Sigil's Spell Check might be of some use - you could filter by spelling mistakes containing a 'hyphen'

BR
I second this suggestion. Sigil Spellcheck can point out every single instance of a hyphenated word:

Click image for larger version

Name:	SigilHyphenationSpellcheck.png
Views:	546
Size:	22.6 KB
ID:	121096

Just add in a hyphen in the Filter box, and make sure "Show All Words" is checked.

I use this all the time to remove accidental hard hyphens leftover from OCR.

I typically do a "two pass" check. Once with "Show All Words" unchecked, and one with "Show All Words" checked.

To replace hyphens with en dashes, I use this Regex:

Search: ([0-9])-([0-9])
Replace: \1–\2

This handles all of the years/page numbers that are typically in the book (although I don't recommend using "replace all", replace on a case-by-case basis even though it will take a while longer).

If you want to get even more refined.... there is no solid way to do it besides checking every single hyphen manually. Probably better to pull the information from a better source, or reOCR the thing yourself and do a code comparison.
Tex2002ans is offline   Reply With Quote
Old 04-01-2014, 05:44 PM   #5
BetterRed
null operator (he/him)
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 21,718
Karma: 29711016
Join Date: Mar 2012
Location: Sydney Australia
Device: none
Quote:
Originally Posted by Tex2002ans View Post
I second this suggestion. Sigil Spellcheck can point out every single instance of a hyphenated word
@Tex2002ans - I'd never looked at Sigil's spell checker until I thought of it as something that might help with OP's I-want-ems-not-hyphens problem.

I really like it, I recall seeing the presentation of misspellings in a similar list arrangement like Sigil's once before, in an add-in for Lotus Notes—loud groans are welcome—I find it much better than the more often used in line highlighting.

I hope Kovid adopts a similar presentation. I suggest the ability to copy the incorrect spelling to the Change Selected Word To: text box be added, maybe via the word list context menu. So that one could edit it there, useful when proper names have incorrect or inconsistent (pet peeve) spelling.

BR
BetterRed is offline   Reply With Quote
Advert
Old 04-01-2014, 06:46 PM   #6
mrmikel
Color me gone
mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.
 
Posts: 2,089
Karma: 1445295
Join Date: Apr 2008
Location: Central Oregon Coast
Device: PRS-300
Sigil's spell check is very good, based on how I use it. It allows sweeping up many mistakes at once and also allows you to triage which to fix first based on how often it appears.
mrmikel is offline   Reply With Quote
Old 04-01-2014, 08:36 PM   #7
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by BetterRed View Post
@Tex2002ans - I'd never looked at Sigil's spell checker until I thought of it as something that might help with OP's I-want-ems-not-hyphens problem.

I really like it, I recall seeing the presentation of misspellings in a similar list arrangement like Sigil's once before, in an add-in for Lotus Notes—loud groans are welcome—I find it much better than the more often used in line highlighting.
I was the one that recommended it be added into Sigil!! And if I recall correctly, it was put in ASAP. It is/was incredible.

Before that, I was using the Index Tool to generate a word list, and using the filter in that to catch hyphenated words. Quite convoluted, but it worked better than anything else I had run across!

The Spell Check Word List was something I had in the back of my mind for YEARS as a very useful tool, but never saw it used anywhere in my life. It is also one of those "killer features" of Sigil that makes it indispensable for me.

It is the best tool for catching/fixing hyphens, and it is also useful having a "Word Count" of spellings. I can easily see the words and how many times they occur. It is fantastic for catching:
  • Accented Words
    • If the book flip flops between accented/unaccented versions, it is most likely a typo.
      • 6 "regime" and 1 "régime"
  • Mispelled Names
    • An error in a name most likely occurs less than a handful of times (< 4). So you can just sort by frequency, and take a close look at all the words that occur less than 4 times.
      • "Abbé" and "Abbe"
  • Hyphenated vs. unhyphenated words
    • A hyphenated + unhyphenated version of a word doesn't occur too often in the same book. You either go with one or the other consistently throughout the work.
      • "step-father" + "stepfather"
      • "mis-information" + "misinformation"
      • "business-man" + "businessman"
      • "life-like" + "lifelike"

It also is extremely helpful that you can sort by Case Sensitive or Case Insensitive. And also extremely helpful that you can toggle just a list of Mispelled Words, OR, a list of all words.

There was also this EPUB Spell Checker tool that came out back in September 2013:

https://www.mobileread.com/forums/sho....php?p=2667112

I recommended a few things in Post #9 + #10.

I also have my own ideas for my own custom tools... Although I have yet to get around to programming them (always getting delayed by other projects, and converting many more books).

Quote:
Originally Posted by mrmikel View Post
Sigil's spell check is very good, based on how I use it. It allows sweeping up many mistakes at once and also allows you to triage which to fix first based on how often it appears.

Last edited by Tex2002ans; 04-01-2014 at 08:39 PM.
Tex2002ans is offline   Reply With Quote
Old 04-03-2014, 03:57 PM   #8
jlocicero
Member
jlocicero can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterjlocicero can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterjlocicero can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterjlocicero can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterjlocicero can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterjlocicero can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterjlocicero can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterjlocicero can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterjlocicero can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterjlocicero can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterjlocicero can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameter
 
Posts: 15
Karma: 12678
Join Date: Apr 2013
Device: none
Wow, thank you all! Since my source has no extra data (-- for en or em dash) I thought it was beyond saving. And searching for naked hyphens to fix in context would have been extremely time consuming. Sigil's spell check looks really helpful.

Thanks again!

And speaking of hyphens, I guess I should say "en-dash" and "em-dash" instead of "en dash" and "em dash"...
jlocicero is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Creating spaces around hyphens (or dashes). wallflowerface Conversion 4 01-04-2014 06:42 PM
txt->epub removes hyphens / dashes / double minus-signs Rizla Conversion 3 05-17-2013 12:09 PM
Dashed Dashes -- Befuddled by EN and EM Dashes (Apple Pages to EPUB) planewryter Conversion 1 07-22-2012 09:52 PM
Fixing hyphens and dashes with regular expressions DoctorT Conversion 1 10-04-2011 10:46 PM
Stripping out dashes on epub convert? toddos Calibre 5 08-01-2010 03:29 PM


All times are GMT -4. The time now is 12:38 PM.


MobileRead.com is a privately owned, operated and funded community.