Thread: Rss2Book
View Single Post
Old 07-02-2007, 08:44 PM   #190
geekraver
Addict
geekraver ought to be getting tired of karma fortunes by now.geekraver ought to be getting tired of karma fortunes by now.geekraver ought to be getting tired of karma fortunes by now.geekraver ought to be getting tired of karma fortunes by now.geekraver ought to be getting tired of karma fortunes by now.geekraver ought to be getting tired of karma fortunes by now.geekraver ought to be getting tired of karma fortunes by now.geekraver ought to be getting tired of karma fortunes by now.geekraver ought to be getting tired of karma fortunes by now.geekraver ought to be getting tired of karma fortunes by now.geekraver ought to be getting tired of karma fortunes by now.
 
Posts: 364
Karma: 1035291
Join Date: Jul 2006
Location: Redmond, WA
Device: iPad Mini,Kindle Paperwhite
For TOC, you have a couple of options: using htmldoc for PDF, or writing your own output plugin that pre-massages the HTML. I may add this as a feature later.

For content extraction, in the regular expression pattern you need to group the various parts you want in parentheses; you then use {0}, {1}, {2}, etc in the formatter to represent the matched blocks. So you might use a pattern like:

<!-- start article rail -->(.*)<foo>.*<bar>(.*)<!-- end article body -->

assuming <foo> started the tag section you wanted to skip and <bar> ended it (".*" represent any sequence of characters, in case you don't know that already) .
geekraver is offline   Reply With Quote