From DefaultProfile
timefmt = ' [%a %d %b %Y]' # The format of the date shown on the first page
url_search_order = ['guid', 'link'] # THe order of elements to search for a URL when parssing the RSS feed
pubdate_fmt = None # The format string used to parse the
Which would imply that only the classes 'link' and 'guid' are searched for the link. This is born out by the fact that when you process the feed from the Denver Post with
use_pubdate = False
get the error message
Skipping article as it does not have a link url
from the source for the feed for each article in the feed the following code appears:
<li class="regularitem" xmlns:dc="http://purl.org/dc/elements/1.1/">
<h4 class="itemtitle">
<a href="http://www.denverpost.com/ci_8088727">
Man hit in crosswalk, killed
</a>
</h4>
<h5 class="itemposttime">
<span>Posted: </span>
Sat, 26 Jan 2008 20:09:37 -0700
</h5>
<div class="itemcontent" name="decodeable">
A 22-year-old Denver resident was killed in Aurora Saturday when a 71-year-old man driving a pickup ran a red light on South Parker Road, then veered into a crosswalk.
</div>
</li>
the url for the article is only contained in the class itemtitle
similarly in the feeds from izvestia the url is only contained in the classes
mainnewstime and mainnewsnotice
and at that only the variable part of the link in the form:
/world/asia/20080127/97803220.html
Which has to be concantenated with
http://www.rian.ru to obtain the fully qualified address.
is it possible to handle either of these cases in web2lrf?
BTW a profile runs much faster in the Terminal than when embedded in libprs500, also I have found that if I attempt to run more than about 3 profiles sequentialy librs500 crashes. I can get around the problem by quitting and restarting. No need to remove the previously captured feeds