web2lrf - Page 12

ddavtian · 02-01-2008, 12:49 AM

Quote:

Originally Posted by JTravers

If you don't mind, I would be curious to know what the speed is like for you using the built-in WSJ profile vs. the one I attached to my previous message. The built-in profile seems to take a much longer time on my system and was wondering if the same applies to you. Maybe it's just a GUI vs. command line thing, though.

Thanks!

I just tested: 36 minutes using the GUI, 5 minutes using web2lrf and your attached profile.

kovidgoyal · 02-01-2008, 01:05 AM

Probably a difference in the two profiles. I just tested newsweek, commandline and GUI were 113s and 116s

I should probably update the wsj profile

EDIT:
Oldest_article is 3 vs. 7 which probably explains it. Also JTravers, is that the correct print url mapping?

JTravers · 02-01-2008, 03:05 AM

Quote:

Originally Posted by kovidgoyal

Probably a difference in the two profiles. I just tested newsweek, commandline and GUI were 113s and 116s

I should probably update the wsj profile

EDIT:
Oldest_article is 3 vs. 7 which probably explains it. Also JTravers, is that the correct print url mapping?

Yes, that might explain it. Still, 36 minutes seems very long.

That print url mapping has always worked for me. You could probably clean up the end of the url too, but I've never found that to be necessary.

JTravers · 02-01-2008, 03:11 AM

Quote:

Originally Posted by kovidgoyal

Oops left in a statement for debugging, I've re-uploaded the windows installer. Re-install and you should be fine.

I reinstalled and still get the same error.
Do I need to uninstall first?
Probably user error on my part. I will try again.

randcoop · 02-05-2008, 07:12 PM

I've downloaded thenation.py and run web2lrf with it. Sort of works, but I can't quite get it. First problem is that I'm not sure about the dates that need to be inserted (one short and one long). And second (and bigger) problem is that I can't figure where to put my login and password.

Without that, I receive notices about needing to subscribe to download some content. And most of the articles seem to come from web postings, not the actual issue.

Any help would be appreciated.

Valloric · 02-11-2008, 12:31 PM

I posted user profiles for Jutarni.hr (the online version of Croatia's most popular newspaper) and USATODAY to the ticket system. I apologize if the ticket system was not the correct way of informing you about them, but it just seemed like it was the right way to do it.

I saw that ticket with all those different requests for news feeds, and if I have the time, I'll try to work through the list. I'm currently working on The New Yorker. Will add it when it's done.

If I mess up a profile, please tell me about it and I'll try to fix it.

kovidgoyal · 02-11-2008, 01:48 PM

Cool, I'll add them in the next release.

Valloric · 02-11-2008, 03:45 PM

Kovid, you have a terrible little bug in web2lrf... maybe not so a bug as a design oversight...

For the last 5 hours I have been attempting to create a The New Yorker user profile, and no matter what I did, the code only retrieved TWO articles from the site... I tried everything... and then I realized what was the problem.

Your code that checks the oldest_article variable... It starts at the top of the feed and continues down, checking each article's date. When it finds an article older than the number in oldest_article, it stops checking subsequent articles. WELL! The RSS feeds on TNY website are not sorted by date, but by some quasi-alphabetical sort, so when this code finds an old article at the very top of the feed (very very likely), it doesn't grab the newer ones which are lower in the listing.

Please fix this so it checks each and every article in the list.

I have uploaded the The New Yorker profile with its oldest_article variable set to 90, it was the only way I could get the newer articles. When you fix the bug, fix the profile accordingly. Everything else about it works fine.

Platapie · 02-17-2008, 01:16 PM

Kovid, I've said this before but with the Economist profile feel the need to say this again. This program is phenomenal, particularly given its OS independence and the .deb packages and ebuilds. I'm a subscriber to the Economist and will I imagine often use your service rather than reading the paper edition more often than not.

Thanks again.

kovidgoyal · 02-17-2008, 02:36 PM

Interesting, I know about the ebuilds, but are the deb packages being maintained as well?

JSWolf · 02-17-2008, 07:22 PM

Quote:

Originally Posted by Valloric

Kovid, you have a terrible little bug in web2lrf... maybe not so a bug as a design oversight...

For the last 5 hours I have been attempting to create a The New Yorker user profile, and no matter what I did, the code only retrieved TWO articles from the site... I tried everything... and then I realized what was the problem.

Your code that checks the oldest_article variable... It starts at the top of the feed and continues down, checking each article's date. When it finds an article older than the number in oldest_article, it stops checking subsequent articles. WELL! The RSS feeds on TNY website are not sorted by date, but by some quasi-alphabetical sort, so when this code finds an old article at the very top of the feed (very very likely), it doesn't grab the newer ones which are lower in the listing.

Please fix this so it checks each and every article in the list.

I have uploaded the The New Yorker profile with its oldest_article variable set to 90, it was the only way I could get the newer articles. When you fix the bug, fix the profile accordingly. Everything else about it works fine.

Please create a ticket so it can be fixed.

ddavtian · 02-21-2008, 07:09 PM

Hi guys.

I'm using the WSJ profile and it works very well (thanks to JTravers for the profile).

I have a quick question: is is possible to get all the articles from a page, not from a feed? RSS feed for "Today's Newspaper" has only 5 articles from front page plus few more from other sections. I'd like to get as many articles from printed edition ("http://online.wsj.com/page/2_0133.html") as possible.

I replaced an existing link with this one, but got a blank page:
def get_feeds(self):
return [
(' Today\'s Newspaper - All', 'http://online.wsj.com/page/2_0133.html'),
## (' Today\'s Newspaper - Page One', 'http://online.wsj.com/xml/rss/3_7205.xml'),
]

Any advise? I want all the links from "http://online.wsj.com/page/2_0133.html" page that have "article" in their address. I don't think I need to change the clean-up part, current profile all the work.

This must be a simple question for Kovid, JTravers and others who have created their profiles.

Thanks in advance,
David

kovidgoyal · 02-21-2008, 07:13 PM

It's certainly doable, but in irder to do it, you have to parse the HTML from that page, see for example the feed for The Atlantic.

ddavtian · 02-21-2008, 07:17 PM

Do you live here? :-)

I didn't see Atlantic under UserProfiles. Where can I find it?

Thanks, David

ddavtian · 02-21-2008, 07:31 PM

Kovid, ignore my previous message. A quick search and I found the thread about Atlantic.

Have to search first.

02-01-2008, 01:05 AM	#167
kovidgoyal creator of calibre Posts: 45,971 Karma: 29579516 Join Date: Oct 2006 Location: Mumbai, India Device: Various	Probably a difference in the two profiles. I just tested newsweek, commandline and GUI were 113s and 116s I should probably update the wsj profile EDIT: Oldest_article is 3 vs. 7 which probably explains it. Also JTravers, is that the correct print url mapping? Last edited by kovidgoyal; 02-01-2008 at 01:08 AM.

02-05-2008, 07:12 PM	#170
randcoop Junior Member Posts: 4 Karma: 10 Join Date: Feb 2008 Device: Sony	The Nation -- subcriber info I've downloaded thenation.py and run web2lrf with it. Sort of works, but I can't quite get it. First problem is that I'm not sure about the dates that need to be inserted (one short and one long). And second (and bigger) problem is that I can't figure where to put my login and password. Without that, I receive notices about needing to subscribe to download some content. And most of the articles seem to come from web postings, not the actual issue. Any help would be appreciated.

02-21-2008, 07:09 PM	#177
ddavtian Addict Posts: 274 Karma: 332 Join Date: Nov 2003 Location: San Francisco, USA Device: Sage, Elipsa, Oasis, Galaxy Tab 8U, S22U	Get Full WSJ? Hi guys. I'm using the WSJ profile and it works very well (thanks to JTravers for the profile). I have a quick question: is is possible to get all the articles from a page, not from a feed? RSS feed for "Today's Newspaper" has only 5 articles from front page plus few more from other sections. I'd like to get as many articles from printed edition ("http://online.wsj.com/page/2_0133.html") as possible. I replaced an existing link with this one, but got a blank page: def get_feeds(self): return [ (' Today\'s Newspaper - All', 'http://online.wsj.com/page/2_0133.html'), ## (' Today\'s Newspaper - Page One', 'http://online.wsj.com/xml/rss/3_7205.xml'), ] Any advise? I want all the links from "http://online.wsj.com/page/2_0133.html" page that have "article" in their address. I don't think I need to change the clean-up part, current profile all the work. This must be a simple question for Kovid, JTravers and others who have created their profiles. Thanks in advance, David

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
web2lrf to capture blog archive?	Deputy-Dawg	Sony Reader Dev Corner	1	02-15-2008 12:41 AM
web2lrf: La Repubblica	alexxxm	Sony Reader	1	11-13-2007 01:27 PM

02-11-2008, 12:31 PM	#171
Valloric Created Sigil, FlightCrew Posts: 1,982 Karma: 350515 Join Date: Feb 2008 Device: Kobo Clara HD	I posted user profiles for Jutarni.hr (the online version of Croatia's most popular newspaper) and USATODAY to the ticket system. I apologize if the ticket system was not the correct way of informing you about them, but it just seemed like it was the right way to do it. I saw that ticket with all those different requests for news feeds, and if I have the time, I'll try to work through the list. I'm currently working on The New Yorker. Will add it when it's done. If I mess up a profile, please tell me about it and I'll try to fix it.

02-11-2008, 01:48 PM	#172
kovidgoyal creator of calibre Posts: 45,971 Karma: 29579516 Join Date: Oct 2006 Location: Mumbai, India Device: Various	Cool, I'll add them in the next release.

02-11-2008, 03:45 PM	#173
Valloric Created Sigil, FlightCrew Posts: 1,982 Karma: 350515 Join Date: Feb 2008 Device: Kobo Clara HD	Kovid, you have a terrible little bug in web2lrf... maybe not so a bug as a design oversight... For the last 5 hours I have been attempting to create a The New Yorker user profile, and no matter what I did, the code only retrieved TWO articles from the site... I tried everything... and then I realized what was the problem. Your code that checks the oldest_article variable... It starts at the top of the feed and continues down, checking each article's date. When it finds an article older than the number in oldest_article, it stops checking subsequent articles. WELL! The RSS feeds on TNY website are not sorted by date, but by some quasi-alphabetical sort, so when this code finds an old article at the very top of the feed (very very likely), it doesn't grab the newer ones which are lower in the listing. Please fix this so it checks each and every article in the list. I have uploaded the The New Yorker profile with its oldest_article variable set to 90, it was the only way I could get the newer articles. When you fix the bug, fix the profile accordingly. Everything else about it works fine.

02-17-2008, 01:16 PM	#174
Platapie Junior Member Posts: 4 Karma: 10 Join Date: Jul 2007 Device: Sony PRS-500	Kovid, I've said this before but with the Economist profile feel the need to say this again. This program is phenomenal, particularly given its OS independence and the .deb packages and ebuilds. I'm a subscriber to the Economist and will I imagine often use your service rather than reading the paper edition more often than not. Thanks again.

02-17-2008, 02:36 PM	#175
kovidgoyal creator of calibre Posts: 45,971 Karma: 29579516 Join Date: Oct 2006 Location: Mumbai, India Device: Various	Interesting, I know about the ebuilds, but are the deb packages being maintained as well?

02-21-2008, 07:13 PM	#178
kovidgoyal creator of calibre Posts: 45,971 Karma: 29579516 Join Date: Oct 2006 Location: Mumbai, India Device: Various	It's certainly doable, but in irder to do it, you have to parse the HTML from that page, see for example the feed for The Atlantic.

02-21-2008, 07:17 PM	#179
ddavtian Addict Posts: 274 Karma: 332 Join Date: Nov 2003 Location: San Francisco, USA Device: Sage, Elipsa, Oasis, Galaxy Tab 8U, S22U	Do you live here? :-) I didn't see Atlantic under UserProfiles. Where can I find it? Thanks, David

02-21-2008, 07:31 PM	#180
ddavtian Addict Posts: 274 Karma: 332 Join Date: Nov 2003 Location: San Francisco, USA Device: Sage, Elipsa, Oasis, Galaxy Tab 8U, S22U	Kovid, ignore my previous message. A quick search and I found the thread about Atlantic. Have to search first.

Advert

Advert