12-22-2009, 10:48 AM | #1 |
onlinenewsreader.net
Posts: 324
Karma: 10143
Join Date: Dec 2009
Location: Phoenix, AZ & Victoria, BC
Device: Kindle 3, Kindle Fire, IPad3, iPhone4, Playbook, HTC Inspire
|
BasicNewsRecipe tag handling
I'm trying to enhance the Globe and Mail news recipe to omit a lot of link fluff at the end of each article. I'm not getting the results I expect from remove_tags, remove_tags_after, remove_tags_before and keep_only_tags. I suspect it has something to do with semantics and processing order. For example, does remove_tags_after mean every tag after the tag specified that is a sibling, or every tag in the current url? When is keep_only_tags processed and how does it interact with remove_tags (e.g. if I keep tag X which includes tag Y specified in remove_tags, will Y be removed, and conversely if I remove tag Y which contains tag X to be kept, what is the result).
I've been looking at the code but tracing this back is turning into an arduous task. Can anyone enlighten me? |
12-22-2009, 11:07 AM | #2 |
creator of calibre
Posts: 44,356
Karma: 23661992
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
keep_oonly_tags if applied first. It means keep every matched tag and allits descendants. remove_tags_after means remove all sibling tags that occur after matched tags in document order. reove_tags means remove matched tags and their descendants.
|
Advert | |
|
12-22-2009, 12:37 PM | #3 |
onlinenewsreader.net
Posts: 324
Karma: 10143
Join Date: Dec 2009
Location: Phoenix, AZ & Victoria, BC
Device: Kindle 3, Kindle Fire, IPad3, iPhone4, Playbook, HTC Inspire
|
Thanks for the reply. Perhaps it's a bug, but as an example the standard Globe and Mail recipe has remove_tags = [ {'id':[ ... 'header' ...]} ...] but <div id="header" ...> content is still included.
|
12-22-2009, 02:19 PM | #4 |
creator of calibre
Posts: 44,356
Karma: 23661992
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Strange, probably is a bug.
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
PDF Handling on New Kindle | Sheikspeare | Amazon Kindle | 21 | 08-09-2010 04:34 AM |
Handling imported styles | MacEachaidh | Sigil | 3 | 07-25-2010 07:06 AM |
Metadata Handling in 0.7.+ | tonyc46 | Calibre | 2 | 06-23-2010 05:35 AM |
BasicNewsRecipe interactive? | BrianG | Calibre | 5 | 01-11-2010 05:40 PM |
Handling several wordlists. | Gianfranco | Bookeen | 9 | 08-20-2008 09:29 AM |