MobileRead Forums - View Single Post - Mathch a string while ignoring some character in that string?

Serpentine · 12-01-2011, 11:05 PM

Quote:

Originally Posted by ElMiko

1) could you explain the code change?

([a-z])
Match a single character from a-z; store the match as a group match. Since that character was then part of the match 't' in your case, it would be replaced.
(?=[a-z])
Lookahead, (?=...)
The following pattern should be found ahead, but is not actually part of the match, i.e it matches everything up until that point, then says, 'is the next character from a-z?'. Since this is not actually part of the match, the replacement does what you want.

Quote:

Originally Posted by ElMiko

2) converting from pdf, how would i go about following your advice?

Hmmm, I generally filter out empty paragraphs(like <p>(\s*|&nbsp

</p>) first, if you have recurring things like that badly formatted chapter heading, change it to something easy to see/match, i.e <p>REMOVE ME</p>. It's often useful to not remove them completely, like in this case they are useful for joining broken paragraphs.