View Single Post
Old 01-09-2015, 03:25 PM   #5
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by 1v4n0 View Post
Hmm what? What's the difference?
As theducks said. Especially in Regex, all three symbols mean VERY different things:
  • ( = parenthesis
    • This is used to capture things. You can then use \1, \2, \3 in order to replace the captured parts.
    • For example:
      • Search: (a)(b)
      • Replace: \2\1
      • This would switch "ab" to "ba"
  • [ = bracket
    • This is used to specify a range of characters.
    • For example:
      • Search: [0-5]
      • This would search for every number that is 0 THROUGH 5: 0, 1, 2, 3, 4, 5
      • Search: [b-e]
      • This would search for every letter that is b THROUGH e: b, c, d, e
  • { = braces or curly bracket or "squiggly bracket"
    • This is used to specify amounts. "How many are we looking for?"
    • For example:
      • Search: [0-9]{4}
      • This would search for ONLY 4 numbers in a row.
      • This one is EXTREMELY helpful for spotting things like years.
      • Search: [0-9]{5,}
      • This would search for 5 OR MORE numbers in a row.
      • I use this one all the time to catch OCR mistakes with years, when the hyphen didn't OCR correctly: "19421945" -> "1942–1945"

Quote:
Originally Posted by 1v4n0 View Post
[...] anyway I need it mostly with letters - for the asides - because sometimes OCR erases the space between the dash and the following word.
I personally tend to favor using the Spell Check tool as an alternate way to catch these hyphenation issues.

Click image for larger version

Name:	SpellcheckHyphen.png
Views:	300
Size:	12.4 KB
ID:	133599

In the "Filter" box, I just stick a hyphen. This will show you every single word with a hyphen in it. Then you can quickly scan the list and see if you spot any oddities. For example, things like "-the" or "-and" will almost NEVER be correct. So when you have a sentence like this:

Quote:
This is a sample sentence-this is an aside-and this is the continuation of the sentence.
In the spellcheck, you will see "sentence-this" and "aside-and". You can then investigate much more closely.

Depending on how many hyphens you have in your book, that could be another way to quickly go through and fix those types of errors.

Edit: Oh wait, I think I see what you mean now. You are talking about "open spacing" around dashes. See: https://en.wikipedia.org/wiki/Dash

Heh, at work, and all the books I work on at "closed spacing" (no spaces around the dashes). Saves me a bunch of headaches!

Last edited by Tex2002ans; 01-09-2015 at 03:32 PM.
Tex2002ans is offline   Reply With Quote