Quote:
Originally Posted by DiapDealer
Do you want it to exclude any bullet that occurs anywhere in any string that is enclosed by those "p class='calibre1'" tags? That could prove pretty tough, if so. I know I can't get my head around the expression to accomplish that (not that THAT renders it impossible by any means  ).
I'd say your best bet is to isolate/alter the scene-breaks first and then catch any possible OCR glitches in a subsequent search.
|
I agree. I was sitting here for the past 40 minutes trying to wrap my head around a regular expression to handle all the situations at once.... and was getting stuck at being able to handle Scene Breaks.
I would follow the advice of mmat1 and temporarily replace all Scene Break '•'s, then go off fixing all the • with a normal search/replace. If you want to be lazy, I narrowed it down to two regexes:
You can Search/Replace with \1\2:
Code:
(<p class="calibre1">[^•]*)•([^<]+</p>)
*** Not very efficient, but gets the job done ***
Red: "In a p with class calibre1" grab 0 or more characters that are NOT '•'.
- This handles cases where the • is the first character in the paragraph, or in between somewhere.
- Is the highly inefficient part. Will grab EVERYTHING until it hits a •.
Middle part: Finds the •.
Blue: Grabs the rest until it hits a </p>
The second regex will grab all the ones in which • is the last character of the paragraph, while failing on your scene breaks.
By the way, here he was the example sentences I came up with and was working on:
Code:
<p class="calibre1">•</p>
<p class="calibre1">•A</p>
<p class="calibre1">A•</p>
<p class="calibre1">This has none.</p>
<p class="calibre1">This has none.</p>
<p class="calibre1">• This is a test sentence.</p>
<p class="calibre1">•</p>
<p class="calibre1">This is a test sentence. •</p>
<p class="calibre1">This is two test sentences. • This is two test sentences.</p>
Also, whenever you would like help on regular expressions, it would help to get lots of test cases. It took me a while to wrap my head around exactly what you were asking.