Quote:
Originally Posted by ElMiko
1) could you explain the code change?
|
([a-z])
Match a single character from a-z; store the match as a group match. Since that character was then part of the match 't' in your case, it would be replaced.
(?=[a-z])
Lookahead, (?=...)
The following pattern should be found ahead, but is not actually part of the match, i.e it matches everything up until that point, then says, 'is the next character from a-z?'. Since this is not actually part of the match, the replacement does what you want.
Quote:
Originally Posted by ElMiko
2) converting from pdf, how would i go about following your advice?
|
Hmmm, I generally filter out empty paragraphs(like <p>(\s*| 

</p>) first, if you have recurring things like that badly formatted chapter heading, change it to something easy to see/match, i.e <p>REMOVE ME</p>. It's often useful to not remove them completely, like in this case they are useful for joining broken paragraphs.