View Single Post
Old 10-27-2023, 06:02 PM   #17
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by JSWolf View Post
How do you use Diap's Editing Toolbag for calibre to deal with dashes?
I personally don't. I always have it on the setting:
  • Do not educate dashes

I only use Diap's Editing Toolbag for smartening quotation marks.

Quote:
Originally Posted by JSWolf View Post
I want to replace all en-dashes with and without spaces to en-dashes without spaces. Also, I want all em-dashes with spaces to be em-dashes without spaces.
Then use a Regular Expressions + Saved Searches:

Regex #3: SPACE + EN DASH + SPACE -> No-space EM DASH
  • Find: ( )–( )
  • Replace: \1—\2

or various mixes of \s or whatever types of spaces you're trying to find/fix.

- - -

Personally, if I was adjusting those dashes, I'd:
  • Use Diap's Editing Toolbag once...
  • Then take a very close look at all the diffs, making sure it got all correct.

I don't believe it's a very smart idea to mass change "spaced dashes" like this without verification (or deciding on a case-by-case basis), because you don't know what sort of madness might be inside the book.

I've seen too many cases of:
  • THIN SPACES
    • This is a small – very small – example of text.
    • This is a small — very small — example of text.
  • HAIR SPACES
    • This is a small – very small – example of text.
    • This is a small — very small — example of text.

or all sorts of weird spacing mixed around the dashes too.

Again, see the 2022 topic: "False paragraph breaks & RegEx" where I went into all the edge-case details. (Like em dashes signifying "cut off" dialogue.)

- - -

Side Note: I've even seen the (definitely wrong):
  • single HYPHENs
    • This is a small - very small - example of text.
    • This is a small-very small-example of text.

This is why it all has to be on a book-by-book, case-by-case basis. Trying to mass change this type of stuff isn't smart without looking through the text first.

Luckily, every book I work with uses and enforces the proper EM DASH with no spaces around it. And I don't have to worry about the quotation dashes or anything like that. Makes it much easier on my end.

- - -

Side Note #2: Personally, this is what I use. 3 sets of Regex:

Regex #1: Remove spaces from EM DASH
  • Search: [ ]*—[ ]*
  • Replace:

Regex #2: Inserting EN DASH
  • Search: ([0-9])-([0-9])
  • Replace: \1–\2

Regex #3: Converting to EN DASH (Accidental EM DASH)
  • Search: ([0-9])—([0-9])
  • Replace: \1–\2

I run:
  • Regex #1 once.
  • Regex #2 and #3 one-at-a-time, and go through the book on a case-by-case basis.
    • You have to be really careful, because URLs especially have lots of numbers+hyphens inside.

Before:

Code:
<p>This is a small— very small — example of text.</p>
<p>The 2000-2010 period was the root cause.</p>
<p>See pp. 5—9.</p>
After:

Code:
<p>This is a small—very small—example of text.</p>
<p>The 2000–2010 period was the root cause.</p>
<p>See pp. 5–9.</p>
That takes care of the bulk of dash mistakes/inconsistencies/"OCR errors" I see.

Last edited by Tex2002ans; 10-28-2023 at 03:39 AM.
Tex2002ans is offline   Reply With Quote