MobileRead Forums - View Single Post

Deputy-Dawg · 01-26-2008, 11:41 PM

From DefaultProfile

timefmt = ' [%a %d %b %Y]' # The format of the date shown on the first page
url_search_order = ['guid', 'link'] # THe order of elements to search for a URL when parssing the RSS feed
pubdate_fmt = None # The format string used to parse the

Which would imply that only the classes 'link' and 'guid' are searched for the link. This is born out by the fact that when you process the feed from the Denver Post with

use_pubdate = False

get the error message

Skipping article as it does not have a link url

from the source for the feed for each article in the feed the following code appears:

<li class="regularitem" xmlns:dc="http://purl.org/dc/elements/1.1/">
<h4 class="itemtitle">
<a href="http://www.denverpost.com/ci_8088727">
Man hit in crosswalk, killed
</a>
</h4>
<h5 class="itemposttime">
<span>Posted: </span>
Sat, 26 Jan 2008 20:09:37 -0700
</h5>
<div class="itemcontent" name="decodeable">
A 22-year-old Denver resident was killed in Aurora Saturday when a 71-year-old man driving a pickup ran a red light on South Parker Road, then veered into a crosswalk.
</div>
</li>

the url for the article is only contained in the class itemtitle

similarly in the feeds from izvestia the url is only contained in the classes

mainnewstime and mainnewsnotice

and at that only the variable part of the link in the form:

/world/asia/20080127/97803220.html

Which has to be concantenated with http://www.rian.ru to obtain the fully qualified address.

is it possible to handle either of these cases in web2lrf?

BTW a profile runs much faster in the Terminal than when embedded in libprs500, also I have found that if I attempt to run more than about 3 profiles sequentialy librs500 crashes. I can get around the problem by quitting and restarting. No need to remove the previously captured feeds

01-26-2008, 11:41 PM	#35
Deputy-Dawg Groupie Posts: 153 Karma: 799 Join Date: Dec 2007 Device: sony prs505	From DefaultProfile timefmt = ' [%a %d %b %Y]' # The format of the date shown on the first page url_search_order = ['guid', 'link'] # THe order of elements to search for a URL when parssing the RSS feed pubdate_fmt = None # The format string used to parse the Which would imply that only the classes 'link' and 'guid' are searched for the link. This is born out by the fact that when you process the feed from the Denver Post with use_pubdate = False get the error message Skipping article as it does not have a link url from the source for the feed for each article in the feed the following code appears: <li class="regularitem" xmlns:dc="http://purl.org/dc/elements/1.1/"> <h4 class="itemtitle"> <a href="http://www.denverpost.com/ci_8088727"> Man hit in crosswalk, killed </a> </h4> <h5 class="itemposttime"> <span>Posted: </span> Sat, 26 Jan 2008 20:09:37 -0700 </h5> <div class="itemcontent" name="decodeable"> A 22-year-old Denver resident was killed in Aurora Saturday when a 71-year-old man driving a pickup ran a red light on South Parker Road, then veered into a crosswalk. </div> </li> the url for the article is only contained in the class itemtitle similarly in the feeds from izvestia the url is only contained in the classes mainnewstime and mainnewsnotice and at that only the variable part of the link in the form: /world/asia/20080127/97803220.html Which has to be concantenated with http://www.rian.ru to obtain the fully qualified address. is it possible to handle either of these cases in web2lrf? BTW a profile runs much faster in the Terminal than when embedded in libprs500, also I have found that if I attempt to run more than about 3 profiles sequentialy librs500 crashes. I can get around the problem by quitting and restarting. No need to remove the previously captured feeds