MobileRead Forums - View Single Post

kiwidude · 12-13-2010, 04:16 PM

A translation if needed for your future use and tweaking. I've added a space here and there below just to stop the post displaying smilies without a code section so : ) instead of this

The ( ) brackets around parts in the Find expression mean grab the bits that match inside the brackets into a "group" for use in the Replace expression. Each () pair can be retrieved incrementally in the replace using \1 \2 \3 etc.

So the (Section \d+: ) means find anything that matches "Section " followed by digits (the \d) of which there are one or more (the +) then a ":" colon.

Then in the replace the \1 means grab the first ( ) group contents which will be the result of matching (Section \d+: ) in this case, and puts inside tags.

The second part of the find is ([^<]+). As above this tells you the match will be put into a group, that you can access in replace using \2.

The [ ]+ says grab one or more characters it can find specified within the square brackets. If you know you are after certain characters you could do [a-z]+ for instance to get only alphabetic letters. However your text likely will contain letters, numbers, spaces, commas, quotes, all sorts of stuff. So rather than actually listing all of the possible values in this case I have said match [^<]+ which means until it reaches not (^) a < character. Effectively it means any characters until it reaches the < of the closing tag.

Hope some of that makes sense... Oh yeah - make sure you tick the "Minimal matching" checkbox in the Options too.

12-13-2010, 04:16 PM	#3
kiwidude Calibre Plugins Developer Posts: 4,735 Karma: 2197770 Join Date: Oct 2010 Location: Australia Device: Kindle Oasis	A translation if needed for your future use and tweaking. I've added a space here and there below just to stop the post displaying smilies without a code section so : ) instead of this The ( ) brackets around parts in the Find expression mean grab the bits that match inside the brackets into a "group" for use in the Replace expression. Each () pair can be retrieved incrementally in the replace using \1 \2 \3 etc. So the (Section \d+: ) means find anything that matches "Section " followed by digits (the \d) of which there are one or more (the +) then a ":" colon. Then in the replace the <p>\1</p> means grab the first ( ) group contents which will be the result of matching (Section \d+: ) in this case, and puts inside <p></p> tags. The second part of the find is ([^<]+). As above this tells you the match will be put into a group, that you can access in replace using \2. The [ ]+ says grab one or more characters it can find specified within the square brackets. If you know you are after certain characters you could do [a-z]+ for instance to get only alphabetic letters. However your text likely will contain letters, numbers, spaces, commas, quotes, all sorts of stuff. So rather than actually listing all of the possible values in this case I have said match [^<]+ which means until it reaches not (^) a < character. Effectively it means any characters until it reaches the < of the closing </p> tag. Hope some of that makes sense... Oh yeah - make sure you tick the "Minimal matching" checkbox in the Options too. Last edited by kiwidude; 12-13-2010 at 04:18 PM.