MobileRead Forums - View Single Post

Tex2002ans · 07-17-2012, 09:30 PM

Quote:

Originally Posted by DiapDealer

Do you want it to exclude any bullet that occurs anywhere in any string that is enclosed by those "p class='calibre1'" tags? That could prove pretty tough, if so. I know I can't get my head around the expression to accomplish that (not that THAT renders it impossible by any means

).

I'd say your best bet is to isolate/alter the scene-breaks first and then catch any possible OCR glitches in a subsequent search.

I agree. I was sitting here for the past 40 minutes trying to wrap my head around a regular expression to handle all the situations at once.... and was getting stuck at being able to handle Scene Breaks.

I would follow the advice of mmat1 and temporarily replace all Scene Break '•'s, then go off fixing all the • with a normal search/replace. If you want to be lazy, I narrowed it down to two regexes:

You can Search/Replace with \1\2:

Code:

(<p class="calibre1">[^•]*)•([^<]+</p>)

*** Not very efficient, but gets the job done ***

Red: "In a p with class calibre1" grab 0 or more characters that are NOT '•'.

This handles cases where the • is the first character in the paragraph, or in between somewhere.
Is the highly inefficient part. Will grab EVERYTHING until it hits a •.

Middle part: Finds the •.

Blue: Grabs the rest until it hits a </p>

Code:

([^>])•(</p>)

The second regex will grab all the ones in which • is the last character of the paragraph, while failing on your scene breaks.

By the way, here he was the example sentences I came up with and was working on:

Code:

<p class="calibre1">•</p>

<p class="calibre1">•A</p>

<p class="calibre1">A•</p>

<p class="calibre1">This has none.</p>

<p class="calibre1">This has none.</p>

<p class="calibre1">• This is a test sentence.</p>

<p class="calibre1">•</p>

<p class="calibre1">This is a test sentence. •</p>

<p class="calibre1">This is two test sentences. • This is two test sentences.</p>

Also, whenever you would like help on regular expressions, it would help to get lots of test cases. It took me a while to wrap my head around exactly what you were asking.