MobileRead Forums - View Single Post

alexxxm · 01-30-2008, 05:50 AM

Thanks, secretsubscribe,
I'm beginning to see the light...
Now I can download a couple of MB of The Atlantic, but I still have one problem:
The text of each article is splitted in some parts, and at the end of each one you have the usual line reading: "Pages: 1 2 3 next>".
The url to which those numbers point are relative, e.g.:

<span class="hankpym">
<span class="safaritime">1</span>
<a href="/doc/200801/miller-education/2">2</a>
<a href="/doc/200801/miller-education/3">3</a>
</span>

<a href="/doc/200801/miller-education/2">next></a>

so I'd like to replace those, but if I add this:
preprocess_regexps = \
[ (re.compile(i[0], re.IGNORECASE | re.DOTALL), i[1]) for i in
[
(r'<a href="/', lambda match : match.group().replace(match.group(1), '<a href="http://www.theatlantic.com')),
# ....
]
]

in addition to yours (modified) def parse_feeds, it isnt able anymore to find any link.
So, how can I replace relative->absolute the links in the individual articles?

any hint appreciated...

Alessandro

01-30-2008, 05:50 AM	#5
alexxxm Addict Posts: 227 Karma: 356 Join Date: Aug 2007 Device: Rocket; Hiebook; N700; Sony 505; Kindle DX ...	Thanks, secretsubscribe, I'm beginning to see the light... Now I can download a couple of MB of The Atlantic, but I still have one problem: The text of each article is splitted in some parts, and at the end of each one you have the usual line reading: "Pages: 1 2 3 next>". The url to which those numbers point are relative, e.g.: <span class="hankpym"> <span class="safaritime">1</span> <a href="/doc/200801/miller-education/2">2</a> <a href="/doc/200801/miller-education/3">3</a> </span> <a href="/doc/200801/miller-education/2">next></a> so I'd like to replace those, but if I add this: preprocess_regexps = \ [ (re.compile(i[0], re.IGNORECASE \| re.DOTALL), i[1]) for i in [ (r'<a href="/', lambda match : match.group().replace(match.group(1), '<a href="http://www.theatlantic.com')), # .... ] ] in addition to yours (modified) def parse_feeds, it isnt able anymore to find any link. So, how can I replace relative->absolute the links in the individual articles? any hint appreciated... Alessandro