For TOC, you have a couple of options: using htmldoc for PDF, or writing your own output plugin that pre-massages the HTML. I may add this as a feature later.
For content extraction, in the regular expression pattern you need to group the various parts you want in parentheses; you then use {0}, {1}, {2}, etc in the formatter to represent the matched blocks. So you might use a pattern like:
<!-- start article rail -->(.*)<foo>.*<bar>(.*)<!-- end article body -->
assuming <foo> started the tag section you wanted to skip and <bar> ended it (".*" represent any sequence of characters, in case you don't know that already) .
|