![]() |
#61 |
Groupie
![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 153
Karma: 799
Join Date: Dec 2007
Device: sony prs505
|
Kovid,
In the attached .zip file is the user-profile for one of my local newspapers. It use to work. Now all it gets is the TOC - no articles. What is strange is that the print file addresses are still the same and the error messages when I run it in terminal do not contain any thing that resembles the URL of the print files. I have enclosed a copy of one such run. My question is has the newspaper changed something or has something changed in lbprs500? |
![]() |
![]() |
![]() |
#62 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,146
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
You need to fix the print_version function, the way the feed links to articles seems to have changed.
|
![]() |
![]() |
Advert | |
|
![]() |
#63 |
Groupie
![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 153
Karma: 799
Join Date: Dec 2007
Device: sony prs505
|
Thats what I thought had happened but the link to the print version of
http://www.nwaonline.net/articles/20...datefiling.txt is http://www.nwaonline.net/articles/20...datefiling.prt which is what I would expect the function as written to return. The only difference I can see, if is different - because I am a bit hazy on how it behaved before, is that the print version opens in a new window. I don't think thats an issue in as much as I have seen others were the print version opened in a new window. Darned if I can put my hands on it though. |
![]() |
![]() |
![]() |
#64 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,146
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
The format of the feed itself has changed use
Code:
url_search_order = ['link', 'guid'] |
![]() |
![]() |
![]() |
#65 |
Groupie
![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 153
Karma: 799
Join Date: Dec 2007
Device: sony prs505
|
Thanks, again! that fixed it. But... what sort of landmarks should I have been looking for in the source file if a similar problem occur again. I guess what I am asking for is more generalized solution.
|
![]() |
![]() |
Advert | |
|
![]() |
#66 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,146
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Well the log has a bunch of error messages about not being able to fetch .prt URLs. That's your clue, it means either that the print_version function no longer works or that the feed format has changed, causing the URL being fed to print_version to be wrong. You can check that by stick a
Code:
print url |
![]() |
![]() |
![]() |
#67 |
Groupie
![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 153
Karma: 799
Join Date: Dec 2007
Device: sony prs505
|
Great minds in the same gutter, well almost. What I did was to put
Code:
return url Last edited by Deputy-Dawg; 03-10-2008 at 11:26 PM. |
![]() |
![]() |
![]() |
#68 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,146
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
You should probably hold off for a bit. I'm in the process of re-writing web2lrf to make it much more powerful.
|
![]() |
![]() |
![]() |
#69 |
Ugly alien
![]() ![]() ![]() Posts: 144
Karma: 225
Join Date: Sep 2007
Location: Québec, QC
Device: tricorder
|
|
![]() |
![]() |
![]() |
#70 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,146
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
It will handle current profiles, but in any case the old web2lrf code will remain for a long time, so no need to worry.
It will be multithreaded, handle many different feed formats, have a much more powerful and easy to use preprocessing engine, so you dont have to use regexps, unless you want to. Eventually, it should be smart enough that if you give it just the URL to a feed, it will go a fetch a reasonably sanitized version of the articles. EDIT: Oh and I forgot that it will have links at the end of each article back to the table of contents Last edited by kovidgoyal; 03-11-2008 at 11:40 AM. |
![]() |
![]() |
![]() |
#71 | |
Ugly alien
![]() ![]() ![]() Posts: 144
Karma: 225
Join Date: Sep 2007
Location: Québec, QC
Device: tricorder
|
Quote:
|
|
![]() |
![]() |
![]() |
#72 | |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,146
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Quote:
|
|
![]() |
![]() |
![]() |
#73 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 11,470
Karma: 13095790
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2 & Air 2, iPhone 7
|
|
![]() |
![]() |
![]() |
#74 |
Junior Member
![]() Posts: 5
Karma: 10
Join Date: Apr 2008
Device: PRS-505
|
Hi, i have some difficulties on
1.making libprs500 see the printable_version URL correctly 2removing the tables. i would appretiate if you lead me. 1. Article URL : http://www.radikal.com.tr/haber.php?haberno=XXXXX Printable URL: http://www.radikal.com.tr/yazici.php?haberno=XXXXX i tried usning this: def print_version (self, url): return url.replace ('http://www.radikal.com.tr/haber.php?haberno=', 'http://www.radikal.com.tr/yazici.php?haberno=') however it still downloads content from the Article URL 2. The article page has 3 rows of tables and i want the one in the middle here is an example of the Article: " http://www.radikal.com.tr/haber.php?haberno=253962" i coppied some lines from The Newyork Times and added --ignore tables--, unfortunately it did no good, html_description = True html2lrf_options = ['--ignore-tables'] remove_tags_before = dict(name='img' , attrs='src') remove_tags_after = dict(id='footer') remove_tags = [dict(attrs={'class':['articleTools', 'post-tools', 'side_tool']}), dict(id=['footer', 'table', 'navigation', 'archive', 'side_search', 'blog_sidebar', 'side_tool', 'side_index']), dict(name=['script', 'noscript'])] what is it that i am doing wrong?? Thanks |
![]() |
![]() |
![]() |
#75 |
Junior Member
![]() Posts: 5
Karma: 10
Join Date: Apr 2008
Device: PRS-505
|
Hi, altough i am a newbee i happen to jump in python language to read my local newspaper. And as expected i need some advice
![]() 1. i failed to show libprs500 print_version URL so the conted comes from the Article URL, Article URL :http://www.radikal.com.tr/haber.php?haberno=253962 Print_vesion URL:http://www.radikal.com.tr/yazici.php?haberno=253962 i tried this which failed: def print_version (self, url): return url.replace ('http://www.radikal.com.tr/haber.php?haberno=', 'http://www.radikal.com.tr/yazici.php?haberno=') 2. So i get the feed from article and to get the main news body from the HTML i removed the tables but this time i cannot cut the news body from the rest of thepage, i copied the recipe from the manual (The Newyork Times) which again ended up in failiure, html_description = True html2lrf_options = ['--ignore-tables'] remove_tags_before = dict(name='img' , attrs='src') remove_tags_after = dict(id='footer') remove_tags = [dict(attrs={'class':['articleTools', 'post-tools', 'side_tool']}), dict(id=['footer', 'table', 'navigation', 'archive', 'side_search', 'blog_sidebar', 'side_tool', 'side_index']), dict(name=['script', 'noscript'])] what is it that i do wrong? Please lead me, thanks anyway..... |
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
RSS Feed | timezone | Feedback | 8 | 01-02-2010 06:55 PM |
RSS Feed questions | rambling | Calibre | 2 | 11-20-2008 05:35 AM |
Working User Profile for Wired.com RSS feeds for libprs500 | DaveNB | Calibre | 6 | 11-30-2007 07:00 AM |
RSS Feed Updates | Alexander Turcic | Announcements | 0 | 06-11-2004 04:11 PM |