Thread: Rss2Book
View Single Post
Old 07-02-2007, 12:31 AM   #189
squeezebag
Junior Member
squeezebag began at the beginning.
 
Posts: 7
Karma: 10
Join Date: Jun 2007
Device: Sony Reader
Regarding the NewYorker feeds,

Thanks a ton. I'm now able to pick up the full articles from the print links (including the pictures and captions). I used the following settings:

URL: http://feeds.newyorker.com/services/...everything.xml
Link Element: Link
Apply extractor to linked contentchecked)
Link Reformatter: {0}?printable=true
Content Extraction pattern: <!-- start article rail -->(.*) <!-- end article body -->

Converts to LRF perfectly. I have two remaining questions.

-I've been able to filter out most of the garbage with the Content Extraction Pattern but I'm still picking up a "keywords" section that I'd like to exclude. Does the Content Extraction thing allow me to extract from A to B, and then from C to D? In other words, there is stuff in the beginning and stuff on the end that I'd like to exclude. There is also a block of stuff in the middle that I'd like to filter out. What's the format for this?

-Also, is there any way to build a table of contents? I can pick up the section summaries from: http://feeds.newyorker.com/services/...everything.xml but is there any way that I can prepend the full extraction with this file? A perfect world would allow me to link from the TOC to the full articles but I'll live with whatever I can get.

Thanks again for your help.

Also, the subscribe function works flawlessly now!
squeezebag is offline   Reply With Quote