MobileRead Forums - View Single Post

squeezebag · 07-02-2007, 01:31 AM

Regarding the NewYorker feeds,

Thanks a ton. I'm now able to pick up the full articles from the print links (including the pictures and captions). I used the following settings:

URL: http://feeds.newyorker.com/services/...everything.xml
Link Element: Link
Apply extractor to linked content

checked)
Link Reformatter: {0}?printable=true
Content Extraction pattern: (.*) 

Converts to LRF perfectly. I have two remaining questions.

-I've been able to filter out most of the garbage with the Content Extraction Pattern but I'm still picking up a "keywords" section that I'd like to exclude. Does the Content Extraction thing allow me to extract from A to B, and then from C to D? In other words, there is stuff in the beginning and stuff on the end that I'd like to exclude. There is also a block of stuff in the middle that I'd like to filter out. What's the format for this?

-Also, is there any way to build a table of contents? I can pick up the section summaries from: http://feeds.newyorker.com/services/...everything.xml but is there any way that I can prepend the full extraction with this file? A perfect world would allow me to link from the TOC to the full articles but I'll live with whatever I can get.

Thanks again for your help.

Also, the subscribe function works flawlessly now!

07-02-2007, 01:31 AM	#189
squeezebag Junior Member Posts: 7 Karma: 10 Join Date: Jun 2007 Device: Sony Reader	Regarding the NewYorker feeds, Thanks a ton. I'm now able to pick up the full articles from the print links (including the pictures and captions). I used the following settings: URL: http://feeds.newyorker.com/services/...everything.xml Link Element: Link Apply extractor to linked contentchecked) Link Reformatter: {0}?printable=true Content Extraction pattern: <!-- start article rail -->(.*) <!-- end article body --> Converts to LRF perfectly. I have two remaining questions. -I've been able to filter out most of the garbage with the Content Extraction Pattern but I'm still picking up a "keywords" section that I'd like to exclude. Does the Content Extraction thing allow me to extract from A to B, and then from C to D? In other words, there is stuff in the beginning and stuff on the end that I'd like to exclude. There is also a block of stuff in the middle that I'd like to filter out. What's the format for this? -Also, is there any way to build a table of contents? I can pick up the section summaries from: http://feeds.newyorker.com/services/...everything.xml but is there any way that I can prepend the full extraction with this file? A perfect world would allow me to link from the TOC to the full articles but I'll live with whatever I can get. Thanks again for your help. Also, the subscribe function works flawlessly now!