Quote:
Originally Posted by JSWolf
How do you use Diap's Editing Toolbag for calibre to deal with dashes?
|
I personally don't. I always have it on the setting:
I only use Diap's Editing Toolbag for smartening quotation marks.
Quote:
Originally Posted by JSWolf
I want to replace all en-dashes with and without spaces to en-dashes without spaces. Also, I want all em-dashes with spaces to be em-dashes without spaces.
|
Then use a Regular Expressions + Saved Searches:
Regex #3: SPACE + EN DASH + SPACE -> No-space EM DASH
- Find: ( )–( )
- Replace: \1—\2
or various mixes of \s or whatever types of spaces you're trying to find/fix.
- - -
Personally, if I was adjusting those dashes, I'd:
- Use Diap's Editing Toolbag once...
- Then take a very close look at all the diffs, making sure it got all correct.
I don't believe it's a very smart idea to mass change "spaced dashes" like this without verification (or deciding on a case-by-case basis), because you don't know what sort of madness might be inside the book.
I've seen too many cases of:
- THIN SPACES
- This is a small – very small – example of text.
- This is a small — very small — example of text.
- HAIR SPACES
- This is a small – very small – example of text.
- This is a small — very small — example of text.
or all sorts of weird spacing mixed around the dashes too.
Again, see the
2022 topic: "False paragraph breaks & RegEx" where I went into all the edge-case details. (Like em dashes signifying "cut off" dialogue.)
- - -
Side Note: I've even seen the (definitely wrong):
- single HYPHENs
- This is a small - very small - example of text.
- This is a small-very small-example of text.
This is why it all has to be on a book-by-book, case-by-case basis. Trying to mass change this type of stuff isn't smart without looking through the text first.
Luckily, every book I work with uses and enforces
the proper EM DASH with no spaces around it.

And I don't have to worry about the quotation dashes or anything like that. Makes it much easier on my end.
- - -
Side Note #2: Personally, this is what I use. 3 sets of Regex:
Regex #1: Remove spaces from EM DASH
- Search: [ ]*—[ ]*
- Replace: —
Regex #2: Inserting EN DASH
- Search: ([0-9])-([0-9])
- Replace: \1–\2
Regex #3: Converting to EN DASH (Accidental EM DASH)
- Search: ([0-9])—([0-9])
- Replace: \1–\2
I run:
- Regex #1 once.
- Regex #2 and #3 one-at-a-time, and go through the book on a case-by-case basis.
- You have to be really careful, because URLs especially have lots of numbers+hyphens inside.
Before:
Code:
<p>This is a small— very small — example of text.</p>
<p>The 2000-2010 period was the root cause.</p>
<p>See pp. 5—9.</p>
After:
Code:
<p>This is a small—very small—example of text.</p>
<p>The 2000–2010 period was the root cause.</p>
<p>See pp. 5–9.</p>
That takes care of the bulk of dash mistakes/inconsistencies/"OCR errors" I see.