MobileRead Forums - View Single Post

Necator · 05-02-2008, 03:26 AM

Hi, altough i am a newbee i happen to jump in python language to read my local newspaper. And as expected i need some advice

1. i failed to show libprs500 print_version URL so the conted comes from the Article URL,

Article URL :http://www.radikal.com.tr/haber.php?haberno=253962
Print_vesion URL:http://www.radikal.com.tr/yazici.php?haberno=253962

i tried this which failed:
def print_version (self, url):
return url.replace ('http://www.radikal.com.tr/haber.php?haberno=', 'http://www.radikal.com.tr/yazici.php?haberno=')

2. So i get the feed from article and to get the main news body from the HTML i removed the tables but this time i cannot cut the news body from the rest of thepage, i copied the recipe from the manual (The Newyork Times) which again ended up in failiure,
html_description = True
html2lrf_options = ['--ignore-tables']
remove_tags_before = dict(name='img' , attrs='src')
remove_tags_after = dict(id='footer')
remove_tags = [dict(attrs={'class':['articleTools', 'post-tools', 'side_tool']}),
dict(id=['footer', 'table', 'navigation', 'archive', 'side_search', 'blog_sidebar', 'side_tool', 'side_index']),
dict(name=['script', 'noscript'])]

what is it that i do wrong? Please lead me, thanks anyway.....

05-02-2008, 03:26 AM	#75
Necator Junior Member Posts: 5 Karma: 10 Join Date: Apr 2008 Device: PRS-505	Hi, altough i am a newbee i happen to jump in python language to read my local newspaper. And as expected i need some advice 1. i failed to show libprs500 print_version URL so the conted comes from the Article URL, Article URL :http://www.radikal.com.tr/haber.php?haberno=253962 Print_vesion URL:http://www.radikal.com.tr/yazici.php?haberno=253962 i tried this which failed: def print_version (self, url): return url.replace ('http://www.radikal.com.tr/haber.php?haberno=', 'http://www.radikal.com.tr/yazici.php?haberno=') 2. So i get the feed from article and to get the main news body from the HTML i removed the tables but this time i cannot cut the news body from the rest of thepage, i copied the recipe from the manual (The Newyork Times) which again ended up in failiure, html_description = True html2lrf_options = ['--ignore-tables'] remove_tags_before = dict(name='img' , attrs='src') remove_tags_after = dict(id='footer') remove_tags = [dict(attrs={'class':['articleTools', 'post-tools', 'side_tool']}), dict(id=['footer', 'table', 'navigation', 'archive', 'side_search', 'blog_sidebar', 'side_tool', 'side_index']), dict(name=['script', 'noscript'])] what is it that i do wrong? Please lead me, thanks anyway.....