Thread: search question
View Single Post
Old 04-03-2013, 03:43 PM   #11
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by Gregg Bell View Post
A couple of questions though. I'm in the final stages of proofreading a novel. I was about one third through it. Then, having fallen totally in love with your b\text\b Regex search tool I started experimenting with it and looking for various things, trying to really get a feel for how it might help me. One of the things I did was put in various punctuation marks, including a straight quotation mark ("), which I often inadvertently put in when editing.
Well now, this is where you can't just go using any Regex under the sun without UNDERSTANDING what it is actually doing first. Regex is extremely powerful.

\btext\b should only be used if you want to find a SPECIFIC WORD. That regex tutorial I linked above uses this example sentence:

Code:
This island is beautiful
If your did a typical search for "is" in your book, you will get 3 matches (red + blue).

If you use the regex "\bis\b", you ONLY get the blue (the EXACT WORD "is").

In english, the "\b" in a regex means:

In this location there is a space OR punctuation mark (!?,."'<> ......) OR pretty much any NON-WORD character at that position.

Case 1:

Code:
\bis\b
In english, this says, first look for a NON-WORD CHARACTER, then for a lowercase 'i', then look for a lowercase 's', then look for a NON-WORD CHARACTER.

(Match is above)

Case 2:

Code:
\bis
In english, this says, first look for a NON-WORD CHARACTER, then look for a lowercase 'i', then look for a lowercase 's'.

Code:
This island is beautiful
Case 3:

Code:
is\b
In english, this says, first look for a lowercase 'i', then look for a lowercase 's', then look for a NON-WORD CHARACTER.

Code:
This island is beautiful
If you wanted to look for STRAIGHT QUOTES, it gets a little uglier, this becomes slightly more complicated using Regex because they are used all over the place in the actual code (classes are surrounded by straight quotes). What would probably be easiest is searching the actual Word document/whatever you typed for the straight quotes, and then going over to Sigil to fix them.

I can go through explaining a straight quote regex for you if you want. But I don't want you running around ruining your book!

Quote:
Originally Posted by Gregg Bell View Post
In that I was doing a final proofread this threw me. I of course wondered if the regex searching had added any other little things. I know you warned about regex deleting things, but can it add things as well?
Yep, if you are not careful you can add things if you don't use the proper Replace (especially if you don't know what you are doing and start using more complex Searches).

Punctuation in Regex gets much uglier (you have to be very careful because many punctuation marks MEAN something in regex). Example of the most common ones:

. = Any character
+ = More than 1 character
* = More than 0 characters

What most likely happened was by you inserting a punctuation mark, it completely changed the meaning of the regex, which began messing some things up.

You better be saving lots of backups before running these regex, don't want you accidentally deleting sections and not being able to get it back. ALWAYS save an alternate copy before messing with things.

Quote:
Originally Posted by Gregg Bell View Post
And a follow-up question: Perhaps (if indeed Regex can add things) it would be wise to only use Regex in the beginning phases of cleaning a document up?
I would not recommend Regex unless you know what you are doing, or are EXTREMELY careful (and do very thorough testing). And NEVER "Replace All" unless it is a very time tested Regex and you know EXACTLY what it does.

It is sort of like when you copy/paste commands that you find online to run things on the commandline. You should really KNOW EXACTLY what the command is telling your computer to do BEFORE you run the command. The command CAN be powerful enough to erase every single directory, but since you don't understand it at all, you just copy/paste and run it!!!

As you can see, in Case 1, I ONLY get the exact word "is", in Case 2, I can get every single word that begins with "is", in Case 3, I can get every single word that ends with "is".

The Regexes almost look exactly the same but they are wildly different.

Quote:
Originally Posted by Gregg Bell View Post
And I'm also a little concerned about how and what it might delete. (Yes, it seems great but scary! And remember I'm just doing my own books--and really they're pretty clean to begin with. Maybe I should leave Regex to pros like you?)
Good ol search and replace, and the other easy tools already available (Spellcheck, that "Characters" Report I mentioned, normal Search and Replace, ...), will probably help you a lot more. You are already working with a very clean document, I don't believe there would be too many mistakes in there. It is not like you are working from an OCR which will introduce many errors which need fixing.

If you need someone else to take a look at your book for you (I might be able to catch a few mistakes), feel free to send your book my way.

Feel free to email me at (my username) @gmail.com
Tex2002ans is offline   Reply With Quote