View Single Post
Old 05-02-2008, 03:06 AM   #74
Necator
Junior Member
Necator began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Apr 2008
Device: PRS-505
Hi, i have some difficulties on
1.making libprs500 see the printable_version URL correctly
2removing the tables.
i would appretiate if you lead me.

1.
Article URL : http://www.radikal.com.tr/haber.php?haberno=XXXXX
Printable URL: http://www.radikal.com.tr/yazici.php?haberno=XXXXX

i tried usning this:
def print_version (self, url):
return url.replace ('http://www.radikal.com.tr/haber.php?haberno=', 'http://www.radikal.com.tr/yazici.php?haberno=')

however it still downloads content from the Article URL

2. The article page has 3 rows of tables and i want the one in the middle
here is an example of the Article: " http://www.radikal.com.tr/haber.php?haberno=253962"

i coppied some lines from The Newyork Times and added --ignore tables--, unfortunately it did no good,
html_description = True
html2lrf_options = ['--ignore-tables']
remove_tags_before = dict(name='img' , attrs='src')
remove_tags_after = dict(id='footer')
remove_tags = [dict(attrs={'class':['articleTools', 'post-tools', 'side_tool']}),
dict(id=['footer', 'table', 'navigation', 'archive', 'side_search', 'blog_sidebar', 'side_tool', 'side_index']),
dict(name=['script', 'noscript'])]

what is it that i am doing wrong?? Thanks
Necator is offline   Reply With Quote