View Single Post
Old 12-01-2011, 10:05 PM   #13
Serpentine
Evangelist
Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.
 
Posts: 416
Karma: 1045911
Join Date: Sep 2011
Location: Cape Town, South Africa
Device: Kindle 3
Quote:
Originally Posted by ElMiko View Post
1) could you explain the code change?
([a-z])
Match a single character from a-z; store the match as a group match. Since that character was then part of the match 't' in your case, it would be replaced.
(?=[a-z])
Lookahead, (?=...)
The following pattern should be found ahead, but is not actually part of the match, i.e it matches everything up until that point, then says, 'is the next character from a-z?'. Since this is not actually part of the match, the replacement does what you want.

Quote:
Originally Posted by ElMiko View Post
2) converting from pdf, how would i go about following your advice?
Hmmm, I generally filter out empty paragraphs(like <p>(\s*|&nbsp</p>) first, if you have recurring things like that badly formatted chapter heading, change it to something easy to see/match, i.e <p>REMOVE ME</p>. It's often useful to not remove them completely, like in this case they are useful for joining broken paragraphs.
Serpentine is offline   Reply With Quote