MobileRead Forums - View Single Post

geekraver · 07-02-2007, 08:44 PM

For TOC, you have a couple of options: using htmldoc for PDF, or writing your own output plugin that pre-massages the HTML. I may add this as a feature later.

For content extraction, in the regular expression pattern you need to group the various parts you want in parentheses; you then use {0}, {1}, {2}, etc in the formatter to represent the matched blocks. So you might use a pattern like:

(.*)<foo>.*<bar>(.*)

assuming <foo> started the tag section you wanted to skip and <bar> ended it (".*" represent any sequence of characters, in case you don't know that already) .

07-02-2007, 08:44 PM	#190
geekraver Addict Posts: 364 Karma: 1035291 Join Date: Jul 2006 Location: Redmond, WA Device: iPad Mini,Kindle Paperwhite	For TOC, you have a couple of options: using htmldoc for PDF, or writing your own output plugin that pre-massages the HTML. I may add this as a feature later. For content extraction, in the regular expression pattern you need to group the various parts you want in parentheses; you then use {0}, {1}, {2}, etc in the formatter to represent the matched blocks. So you might use a pattern like: <!-- start article rail -->(.)<foo>.<bar>(.)<!-- end article body --> assuming <foo> started the tag section you wanted to skip and <bar> ended it ("." represent any sequence of characters, in case you don't know that already) .