View Full Version : web2lrf


Pages : 1 [2]

kovidgoyal
03-22-2008, 05:43 PM
There's a bug that causes problems with custom recipes. Just copy the import statement to the line just above where it is used and you should be fine.

Deputy-Dawg
03-22-2008, 06:58 PM
Kovid,
I modified the code as follows:

#!/usr/bin/env python

## Copyright (C) 2008 Kovid Goyal kovid@kovidgoyal.net
## This program is free software; you can redistribute it and/or modify
## it under the terms of the GNU General Public License as published by
## the Free Software Foundation; either version 2 of the License, or
## (at your option) any later version.
##
## This program is distributed in the hope that it will be useful,
## but WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
## GNU General Public License for more details.
##
## You should have received a copy of the GNU General Public License along
## with this program; if not, write to the Free Software Foundation, Inc.,
## 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
'''
theatlantic.com
'''
import re
from libprs500.web.feeds.news import BasicNewsRecipe

class TheAtlantic(BasicNewsRecipe):

title = 'The Atlantic'
INDEX = 'http://www.theatlantic.com/doc/current'

remove_tags_before = dict(name='div', id='storytop')
remove_tags = [dict(name='div', id='seealso')]
extra_css = '#bodytext {line-height: 1}'

def parse_index(self):
articles = []

src = self.browser.open(self.INDEX).read()
from libprs500.ebooks.BeautifulSoup import BeautifulSoup
soup = BeautifulSoup(src, convertEntities=BeautifulSoup.HTML_ENTITIES)

issue = soup.find('span', attrs={'class':'issue'})
if issue:
self.timefmt = ' [%s]'%self.tag_to_string(issue).rpartition('|')[-1].strip().replace('/', '-')

for item in soup.findAll('div', attrs={'class':'item'}):
a = item.find('a')
if a and a.has_key('href'):
url = a['href']
url = 'http://www.theatlantic.com/'+url.replace('/doc', 'doc/print')
title = self.tag_to_string(a)
byline = item.find(attrs={'class':'byline'})
date = self.tag_to_string(byline) if byline else ''
description = ''
articles.append({
'title':title,
'date':date,
'url':url,
'description':description
})


return {'Current Issue' : articles }

and now I get:

Macintosh-3:books billc$ feeds2lrf atlantic-2.py
Fetching feeds...
0% [----------------------------------------------------------------------]
Fetching feeds... Traceback (most recent call last):
File "/Users/billc/Downloads/libprs500.app/Contents/Resources/feeds2lrf.py", line 9, in <module>
main()
File "libprs500/ebooks/lrf/feeds/convert_from.pyo", line 52, in main
File "libprs500/web/feeds/main.pyo", line 141, in run_recipe
File "libprs500/web/feeds/news.pyo", line 411, in download
File "libprs500/web/feeds/news.pyo", line 515, in build_index
File "libprs500/web/feeds/__init__.pyo", line 193, in feeds_from_index
ValueError: too many values to unpack
Macintosh-3:books billc$

kovidgoyal
03-22-2008, 07:17 PM
The return statement should be

return [('Current Issue', articles)]


You should probably look at the latest atlantic profile in svn. As there we some changes.

ddavtian
03-25-2008, 06:34 PM
Kovid, I tried to use "feeds2disk" for Newsweek (built-in profile gets very few articles from the latest issue) and got an error message:


C:\Misc\News\Newsweek>feeds2disk --feeds="['http://feeds.newsweek.com/newsweek/NationalNews','http://feeds.newsweek.com/headlines/business','http://feeds.newswe
ek.com/newsweek/WorldNews']"
Fetching feeds...
Traceback (most recent call last):
File "main.py", line 158, in <module>
File "main.py", line 153, in main
File "main.py", line 134, in run_recipe
UnboundLocalError: local variable 'is_profile' referenced before assignment


feeds2disk works fine with built-in profiles, but I always got this error when specifying the feed address.

David

kovidgoyal
03-25-2008, 06:47 PM
Will be fixed in the next release.

ddavtian
03-25-2008, 06:56 PM
Will be fixed in the next release.

I didn't expect anything else :)

Rick C
03-26-2008, 12:14 AM
I was wondering if you might be interested in doing a script for a Canadian publication? I have tried to modify existing scripts to fit but so far no good.

Even if you had a look and gave me some guidance, I'd appreciate it.

Macleans weekly Canadian newsmagazine.
http://www.macleans.ca/rss/

Globe and Mail
http://www.theglobeandmail.com/frontpage/

Toronto Star
http://www.thestar.com/generic/article/111417

Thanks

kovidgoyal
03-26-2008, 11:53 PM
I'm afraid I dont do feeds on request (I prefer to spend my time adding new features).

kovidgoyal
03-26-2008, 11:55 PM
Released version 0.4.44 with a nice new tutorial on creating recipes to fetch news at

http://libprs500.kovidgoyal.net/user_manual/news.html

If any of you have suggestions on improving the tutorial, let me know, or better yet submit a patch against http://libprs500.kovidgoyal.net/browser/trunk/src/libprs500/manual/news.rst

Deputy-Dawg
03-27-2008, 07:07 AM
Kovid,
I downloaded 4.44 to my mac and it did not launch but rather generated the followin error message:

libprs500 Error
An unexpected error has occurred during execution
of the main script

InterfaceError binding parameter 0 - probably
unsupported type

With the option to open the Console or to Terminae. Before I hit the panic switch I deleted the file and my backup copies of 4.42 and 4.43 (in hindsight not the brightest option in the world) and rebooted my machine. I then re-downloaded the file and got the same result. The console log contaiined the following information:

3/27/08 5:36:15 AM Suitcase Fusion[137] *** -[NSConditionLock unlock]: lock (<NSConditionLock: 0x3bf150> '(null)') unlocked from thread which did not lock it
3/27/08 5:36:15 AM Suitcase Fusion[137] *** Break on _NSLockError() to debug.
3/27/08 5:36:37 AM [0x0-0x1f01f].net.kovidgoyal.librs500[191] Traceback (most recent call last):
3/27/08 5:36:37 AM [0x0-0x1f01f].net.kovidgoyal.librs500[191] File "/Users/billc/Downloads/libprs500.app/Contents/Resources/__boot__.py", line 208, in <module>
3/27/08 5:36:37 AM [0x0-0x1f01f].net.kovidgoyal.librs500[191] _run('main.py')
3/27/08 5:36:37 AM [0x0-0x1f01f].net.kovidgoyal.librs500[191] File "/Users/billc/Downloads/libprs500.app/Contents/Resources/__boot__.py", line 135, in _run
3/27/08 5:36:37 AM [0x0-0x1f01f].net.kovidgoyal.librs500[191] execfile(path, globals(), globals())
3/27/08 5:36:37 AM [0x0-0x1f01f].net.kovidgoyal.librs500[191] File "/Users/billc/Downloads/libprs500.app/Contents/Resources/main.py", line 1025, in <module>
3/27/08 5:36:37 AM [0x0-0x1f01f].net.kovidgoyal.librs500[191] sys.exit(main())
3/27/08 5:36:37 AM [0x0-0x1f01f].net.kovidgoyal.librs500[191] File "/Users/billc/Downloads/libprs500.app/Contents/Resources/main.py", line 1014, in main
3/27/08 5:36:37 AM [0x0-0x1f01f].net.kovidgoyal.librs500[191] main = Main()
3/27/08 5:36:37 AM [0x0-0x1f01f].net.kovidgoyal.librs500[191] File "/Users/billc/Downloads/libprs500.app/Contents/Resources/main.py", line 169, in __init__
3/27/08 5:36:37 AM [0x0-0x1f01f].net.kovidgoyal.librs500[191] self.library_view.set_database(self.database_path)
3/27/08 5:36:37 AM [0x0-0x1f01f].net.kovidgoyal.librs500[191] File "libprs500/gui2/library.pyo", line 454, in set_database
3/27/08 5:36:37 AM [0x0-0x1f01f].net.kovidgoyal.librs500[191] File "libprs500/gui2/library.pyo", line 118, in set_database
3/27/08 5:36:37 AM [0x0-0x1f01f].net.kovidgoyal.librs500[191] File "libprs500/library/database.pyo", line 810, in __init__
3/27/08 5:36:37 AM [0x0-0x1f01f].net.kovidgoyal.librs500[191] File "libprs500/library/database.pyo", line 776, in upgrade_version8
3/27/08 5:36:37 AM [0x0-0x1f01f].net.kovidgoyal.librs500[191] sqlite3.InterfaceError: Error binding parameter 0 - probably unsupported type.
3/27/08 5:36:37 AM libprs500[191] libprs500 Error
3/27/08 5:36:37 AM libprs500[191] libprs500 Error
An unexpected error has occurred during execution of the main script

InterfaceError: Error binding parameter 0 - probably unsupported type.

3/27/08 5:36:38 AM libprs500[191] Error loading /Library/ScriptingAdditions/QXPScriptingAdditions.osax/Contents/MacOS/QXPScriptingAdditions: dlopen(/Library/ScriptingAdditions/QXPScriptingAdditions.osax/Contents/MacOS/QXPScriptingAdditions, 262): no suitable image found. Did find:
/Library/ScriptingAdditions/QXPScriptingAdditions.osax/Contents/MacOS/QXPScriptingAdditions: mach-o, but wrong architecture
3/27/08 5:36:38 AM [0x0-0x1f01f].net.kovidgoyal.librs500[191] libprs500: OpenScripting.framework - scripting addition /Library/ScriptingAdditions/QXPScriptingAdditions.osax declares no loadable handlers.
3/27/08 5:36:44 AM [0x0-0x1f01f].net.kovidgoyal.librs500[191] Traceback (most recent call last):
3/27/08 5:36:44 AM [0x0-0x1f01f].net.kovidgoyal.librs500[191] File "libprs500/gui2/jobs.pyo", line 279, in headerData
3/27/08 5:36:44 AM [0x0-0x1f01f].net.kovidgoyal.librs500[191] TypeError: 'NoneType' object is not callable
3/27/08 5:36:44 AM [0x0-0x1f01f].net.kovidgoyal.librs500[191] Traceback (most recent call last):
3/27/08 5:36:44 AM [0x0-0x1f01f].net.kovidgoyal.librs500[191] File "libprs500/gui2/jobs.pyo", line 280, in headerData
3/27/08 5:36:44 AM [0x0-0x1f01f].net.kovidgoyal.librs500[191] TypeError: 'NoneType' object is not callable
3/27/08 5:36:44 AM [0x0-0x1f01f].net.kovidgoyal.librs500[191] Traceback (most recent call last):
3/27/08 5:36:44 AM [0x0-0x1f01f].net.kovidgoyal.librs500[191] File "libprs500/gui2/jobs.pyo", line 281, in headerData
3/27/08 5:36:44 AM [0x0-0x1f01f].net.kovidgoyal.librs500[191] TypeError: 'NoneType' object is not callable
3/27/08 5:36:44 AM [0x0-0x1f01f].net.kovidgoyal.librs500[191] Traceback (most recent call last):
3/27/08 5:36:44 AM [0x0-0x1f01f].net.kovidgoyal.librs500[191] File "libprs500/gui2/jobs.pyo", line 282, in headerData
3/27/08 5:36:44 AM [0x0-0x1f01f].net.kovidgoyal.librs500[191] TypeError: 'NoneType' object is not callable
3/27/08 5:36:44 AM com.apple.launchd[68] ([0x0-0x1f01f].net.kovidgoyal.librs500[191]) Exited with exit code: 255

Needless to say I now have no working copy of libprs500 on my machine and no access to one. Sigh!!

The Old Man
03-27-2008, 07:18 AM
Kovid keeps all old copies at:
https://libprs500.kovidgoyal.net/wiki/Changelog

Deputy-Dawg
03-27-2008, 08:20 AM
Kovid keeps all old copies at:
https://libprs500.kovidgoyal.net/wiki/Changelog

All I see there is the changelog for the various versions. I cannot find any archives anywhere. Nor am I suggesting that there should be any

The Old Man
03-27-2008, 08:42 AM
All I see there is the changelog for the various versions. I cannot find any archives anywhere. Nor am I suggesting that there should be any
You're right! I was wrong. I thought those were links to the previous versions.
Sorry about that. I have 4.2 and can send it to you if you wish.

llasram
03-27-2008, 08:45 AM
All I see there is the changelog for the various versions. I cannot find any archives anywhere. Nor am I suggesting that there should be any

If you're actively tracking and working with new features anyway, you could set yourself up to run from subversion. Then if you find something broken you (a) can submit an issue so Kovid can fix it before the actual release and (b) are able to revert to some earlier revision which doesn't have the bug you found.

Rick C
03-27-2008, 12:57 PM
Released version 0.4.44 with a nice new tutorial on creating recipes to fetch news at

http://libprs500.kovidgoyal.net/user_manual/news.html

If any of you have suggestions on improving the tutorial, let me know, or better yet submit a patch against http://libprs500.kovidgoyal.net/browser/trunk/src/libprs500/manual/news.rst

Kovid,
This looks awesome-although I have not begun to read it yet:bookworm:

How bout an lrf version?:thumbsup:

Deputy-Dawg
03-27-2008, 02:02 PM
If you're actively tracking and working with new features anyway, you could set yourself up to run from subversion. Then if you find something broken you (a) can submit an issue so Kovid can fix it before the actual release and (b) are able to revert to some earlier revision which doesn't have the bug you found.

I admit it, it was stupid of me to not retain backup copies of the program. I am sure you have never done anything which in hindsight was stupid!

That being said I do have a copy of the svn file(s) but I haven't, as yet, mastered how to compiile and install it.

kovidgoyal
03-27-2008, 02:19 PM
I'm on it, update later today :)

kovidgoyal
03-27-2008, 02:48 PM
@DD: I've narrowed it down to a problem migrating custom user profiles to recipes in the database. Can you post your library1.db file online somewhere I can access it to debug and PM me a link (if it's not too big). If it is too big run the following commands:


libprs500-debug


Now at the python prompt:


from libprs500.library.database import *
import os
db = LibraryDatabase('/path/to/your/library1.db')
f = open('scripts.txt', wb)
for title, scripts in db.get_feeds(): print >>f, script, '\n-----------\n'


Then attach the file scripts.txt here.

kovidgoyal
03-27-2008, 02:50 PM
Kovid,
This looks awesome-although I have not begun to read it yet:bookworm:

How bout an lrf version?:thumbsup:

I find e-readers pretty inadequate when it comes to browsing technical documentation, so probably not :)

kovidgoyal
03-27-2008, 02:56 PM
@DD:

Actually I may have found the problem. I'll post a link to an updated dmg in a bit that will hopefully fix it.

kovidgoyal
03-27-2008, 03:08 PM
Hopefully fixed dmg is at http://theory.caltech.edu/~kovid/libprs500-0.4.44.dmg

Let me know if it works, and I'll release .45

ddavtian
03-27-2008, 04:40 PM
Kovid, were there any changes in Newsweek profile? The one that you posted here on 3/12 (before feeds2lrf was released) was perfect. I got a similar one in 0.4.42. But 43 and 44 create different output (many more sections, one article per section).

I'm not qualified to add new profiles, but love using the built-in ones.

Also, there are two WSJ profiles in 4.44.

David

Rick C
03-27-2008, 04:42 PM
Kovid,

I saw the new windows release of libprs500 0.4.44 while at work and d'lded it so I could have a go at creating a feed with your new manual, and was excited to see that a recipe for google reader had been added.

When I came home and d'lded it to my PC, I first uninstalled the old version and installed the new one. The main executable simply shuts itself down, no errors, no nothing.

I have cleaned out the registry, re-downloaded and re-installed/re-booted several times-there seems to be a problem.

kovidgoyal
03-27-2008, 04:43 PM
Yeah the new newsweek profile downloads all the articles from the current issue rather than the RSS feeds. Oops, looks like I forgot to remove the old wall street journal profile.

kovidgoyal
03-27-2008, 04:44 PM
@rickC

wait a bit for .45 (.44 has a bug that makes it crash on some systems)

Deputy-Dawg
03-27-2008, 04:48 PM
Kovid,
The fixed version seems to work for me now. And with just one typed entry I was able to create a recipe for "Wispers in the Logia" RSS. It was one of those which was a bit vexing before!

Yes, I also had to do some cut and paste from Safari to get it to work.

Rick C
03-27-2008, 05:21 PM
Well I was able to get v.44 to run on my other computer-unfortunately the google feed does not seem to work in this version.

If it can be tweaked,it would sure go a long way towards eliminating the need to learn python and still get a custom rss feed in lbprs. It is a great idea,really.

kovidgoyal
03-27-2008, 05:25 PM
The present version of the google reader recipe only downloads starred feeds from your google reader account.

ddavtian
03-27-2008, 05:43 PM
Google reader recipe for me downloads all the feeds (even not the starred ones) but only the summary part that's visible within a reader.

kovidgoyal
03-27-2008, 05:58 PM
Hmm, I'll have to look at the code for the google reader recipe more carefully, it isn't mine.

Rick C
03-27-2008, 06:43 PM
Google reader recipe for me downloads all the feeds (even not the starred ones) but only the summary part that's visible within a reader.

OK I can get the starred items then, but not any of the others-and yes, I just get the first couple of lines like you said ddavtian.:chinscratch:

Deputy-Dawg
03-28-2008, 11:27 PM
Kovid,
I just downloaded 4.46 and I think the Gremlins are working overtime...

When I try to use my custom news sources it will only launch the one in the #1 slot and the one in the #6 slot. Or so it seems (I say that because I have not had enough time to try all permutations). And which one is launched seems to be predicated in some complex (or perhaps random) way on which was launched prior to the instant selection.

Fortunately it isn't going to get to much in my hair as much as I am now using crontab to do my news gathering. But it does make it a tad difficult to test the recipes for inclusion in the custom source list.

Sigh!!!

kovidgoyal
03-29-2008, 12:24 AM
Wow, this has not been a good week for me. I'll look into it. You can always test new recipes by savin them in a .py file and passing the path of the file to feeds2lrf

Deputy-Dawg
03-29-2008, 01:06 AM
Yeah, I am not at all concerned. And we all have bad weeks. Mine wasn't exactly sterling either. Monday I get up at 4 in the morning to be at the local hemodialysis center and am not on the machine until 9:15 starting a 3.5 hour treatment. I'm suppose to be 'on' by 6:30, and that was one of the better days.

I am off to bed.

BTW there is no pattern. On a recent pass the only recipe that would open was the one in the #1 Slot. Weird1

Deputy-Dawg
03-29-2008, 09:25 AM
Kovid,
When I got up this morning I rebooted my machine and now the error is either much better and/or much stranger than before. The lead custom source in my list is a version of the Jerusalem Post that I use for benchmarking. When I click on it it does in fact download the Jerusalem Post. If I now click on my last entry, which is Whispers in the Logia - a new source using the newer 'recipe' I am told I am downloading the 'Jerusalem Post' but in fact it does download the Whispers in the Logia. If I click on the hourglass I am showen that, indeed, librs500 v4.46 is processing the Jerusalem Post. If I now click on the New York Times things get very strange indeed. My user name and password are embedded in the profile but I none the less get the following error message:

ValueError: The The New York Times recipe needs a username and password.
Failed to perform job: Fetch news from Jerusalem Post
Detailed traceback:
Traceback (most recent call last):
File "libprs500/parallel.pyo", line 139, in run_job
File "libprs500/ebooks/lrf/feeds/convert_from.pyo", line 40, in main
File "libprs500/web/feeds/main.pyo", line 126, in run_recipe
File "libprs500/web/feeds/news.pyo", line 816, in __init__
File "libprs500/web/feeds/news.pyo", line 393, in __init__
ValueError: The The New York Times recipe needs a username and password.
Log:
Fetching feeds...

Note that line 1 says that the New York Times recipe needs a username and password and that the second line says it failed to perform job: Fetch news from Jerusalem Post.

I have attached a pdf file containing 8 screen shots documenting my stepwise operations and what librs500 is displaying. It may very well be that this is what was going on last night but my connection was so miserably slow that I canceled what I thought were multiple downloads of files I had no interest in.

mazzeltjes
03-29-2008, 09:27 AM
Hi Kovid
Just installed 4.46
and trying to get newsweek I got this

TypeError: expected string or buffer
Failed to perform job: Fetch news from Newsweek
Detailed traceback:
Traceback (most recent call last):
File "parallel.py", line 139, in run_job
File "libprs500\ebooks\lrf\feeds\convert_from.pyo", line 40, in main
File "libprs500\web\feeds\main.pyo", line 132, in run_recipe
File "libprs500\web\feeds\news.pyo", line 464, in download
File "libprs500\web\feeds\news.pyo", line 567, in build_index
File "libprs500\web\feeds\recipes\newsweek.pyo", line 62, in parse_index
File "libprs500\web\feeds\news.pyo", line 307, in index_to_soup
File "re.pyo", line 129, in match
TypeError: expected string or buffer
Log:
Fetching feeds...
0% Fetching feeds...

:(:(:(


edit
and this one

IndexError: list index out of range
Failed to perform job: Fetch news from Outlook India
Detailed traceback:
Traceback (most recent call last):
File "parallel.py", line 139, in run_job
File "libprs500\ebooks\lrf\feeds\convert_from.pyo", line 40, in main
File "libprs500\web\feeds\main.pyo", line 132, in run_recipe
File "libprs500\web\feeds\news.pyo", line 464, in download
File "libprs500\web\feeds\news.pyo", line 580, in build_index
IndexError: list index out of range
Log:
Fetching feeds...
0% Fetching feeds...
0% Got feeds from index page
0% Trying to download cover...

kovidgoyal
03-29-2008, 09:54 AM
@mazzeltjes
Both problems are likely due to network errors. Try again in a little bit.

kovidgoyal
03-29-2008, 05:06 PM
@DD
Because of a change in the way options are processed, you can't embed the username and password in a recipe anymore.

kovidgoyal
03-29-2008, 05:49 PM
@DD
I've implemented a fix for the custom recipes, will be in the next release

Rick C
03-29-2008, 06:05 PM
@DD
I've implemented a fix for the custom recipes, will be in the next release
Will you be looking at Davec's Google Reader Recipe,Kovid?

kovidgoyal
03-29-2008, 06:49 PM
Open a ticket for it, and I'll get around to it.

Rick C
03-29-2008, 07:46 PM
done:thumbsup:

Deputy-Dawg
03-29-2008, 09:08 PM
@DD
Because of a change in the way options are processed, you can't embed the username and password in a recipe anymore.

OK! But then I would think that the recipe should have prompted for the username and password since I had taken no action to embed them. I merely had entered them the first time I ran the profile cum recipe. It really doesn't bother me that much the question is just academic.

megacoupe
04-07-2008, 02:22 AM
It's been quite a while since I've used libprs500, and it looks like it's come a long way. It looks beautiful! Anyways, trying to fetch nytimes.com gave me the following error:

IOError: [Errno 2] No such file or directory: 'c:\\docume~1\\admin\\locals~1\\temp\\html2lrf-verbose.html'
Failed to perform job: Fetch news from The New York Times
Detailed traceback:
Traceback (most recent call last):
File "parallel.py", line 139, in run_job
File "libprs500\ebooks\lrf\feeds\convert_from.pyo", line 55, in main
File "libprs500\ebooks\lrf\html\convert_from.pyo", line 1796, in process_file
File "libprs500\ebooks\lrf\html\convert_from.pyo", line 261, in __init__
File "libprs500\ebooks\lrf\html\convert_from.pyo", line 363, in add_file
File "libprs500\ebooks\lrf\html\convert_from.pyo", line 337, in preprocess
IOError: [Errno 2] No such file or directory: 'c:\\docume~1\\admin\\locals~1\\temp\\html2lrf-verbose.html'


I copied up until the "log" part because it was way too long to post the whole thing here. Same thing seems to have happened when attempting to get CNN.com and Wired.

kovidgoyal
04-07-2008, 04:19 AM
Yeah slow and steady progress. :)

Looks like for some reason the directory
c:\docume~1\admin\locals~1\temp

doesn't exists. Create it and you should be fine.

megacoupe
04-07-2008, 02:29 PM
Hmmmm, it seems that that folder DOES exist, and it has a "html2lrf-verbose.html" file in it as well (when I opened it, there was a single Wired article).

Now I'm really baffled...

kovidgoyal
04-07-2008, 10:56 PM
Try running it from the command line


feeds2lrf Wired.com


then


feeds2lrf --verbose Wired.com


The second is what the GUI runs and causes the html2lrf-verbose files to be created.

megacoupe
04-08-2008, 04:52 PM
Before trying out your suggestion, I rebooted my computer and tried the GUI again - and it worked!

Today, however, the GUI failed me with the same error. I don't want to have to reboot my computer every time I want to read the NYT...

Any ways, "feeds2lrf Wired.com" worked just fine and created the lrf file, but once "--verbose" was added, I got lines and lines of something being processed and then:

IOError: [Errno 2] No such file or directory: 'c:\\docume~1\\admin\\locals~1\\temp\\html2lrf-verbose.html'

Once again, I checked to make sure the directory exists and the html2lrf-verbose.html file had one Wired article in it.

Any ideas what's going on?

kovidgoyal
04-08-2008, 06:53 PM
Hmm, that's just weird. At any rate, I've made some changes that should make future versions of libprs500 ignore that error and just keep going, since the html2lrf-verbose file is useful only for debugging anyway. Unfortunately, I am travelling for three weeks (getting married) so I will not be able to release a new version for a while. I'd suggest using the command line till then.

ddavtian
04-08-2008, 07:06 PM
Congratulations on getting married!
We'll miss your releases here but current one does a good job.

kovidgoyal
04-08-2008, 11:04 PM
Thanks :)

megacoupe
04-09-2008, 12:08 AM
Hey, thanks for taking the time to respond.

And huge Mazel Tov to you and your bride to be!

kovidgoyal
04-09-2008, 08:04 AM
My pleasure :)

StDo
04-12-2008, 05:58 PM
Unfortunately, I am travelling for three weeks (getting married) ...

The Fortune may be with you - and your wife.

Congratulations! :toff:

Bubble
04-21-2008, 07:13 AM
I was wondering if you might be interested in doing a script for a Canadian publication? I have tried to modify existing scripts to fit but so far no good.

Even if you had a look and gave me some guidance, I'd appreciate it.

Macleans weekly Canadian newsmagazine.
http://www.macleans.ca/rss/

Globe and Mail
http://www.theglobeandmail.com/frontpage/

Toronto Star
http://www.thestar.com/generic/article/111417

Thanks

Did you ever managed to get a working profile for any of these feeds Rick C?

So far this is what I have as the profile for the Toronto Star


class TheStar(BasicNewsRecipe):

title = 'TheStar'
timefmt = ' [%a, %d %b, %Y]'
html_description = True
oldest_article = 7
no_stylesheets = True

feeds = [
('Top News Stories', 'http://www.thestar.com/rss/000-082-672?searchMode=Lineup'),
('Toronto & GTA', 'http://www.thestar.com/rss/97427?searchMode=Lineup'),
('Ontario', 'http://www.thestar.com/rss/0?searchMode=Query&categories=311'),
('Canada', 'http://www.thestar.com/rss/000-097-467?searchMode=Lineup'),
('World', 'http://www.thestar.com/rss/000-098-744?searchMode=Lineup'),
('Top Business Stories', 'http://www.thestar.com/rss/000-082-796?searchMode=Lineup'),
('Top Entertain. Stories', 'http://www.thestar.com/rss/117741?searchMode=Lineup'),
('Top Living Stories', 'http://www.thestar.com/rss/000-082-839?searchMode=Lineup'),
('Health', 'http://www.thestar.com/rss/000-082-844?searchMode=LineupAndQuery&categories=299'),
('Top Sci-Tech Stories', 'http://www.thestar.com/rss/82848?searchMode=Query&categories=300'),
('Ideas', 'http://www.thestar.com/rss/93199?searchMode=Lineup'),
]

def print_version(self, url):
return url.replace('http://www.thestar.com/article/', 'http://www.thestar.com/printArticle/')




I figured out how to add the profile through here (http://libprs500.kovidgoyal.net/user_manual/faq.html#i-obtained-a-recipe-for-a-news-site-as-a-py-file-from-somewhere-how-do-i-use-it)

So far, it work! Need to figure out how to edit it so the font is smaller though... And maybe removing the banner...

Bubble
04-22-2008, 03:37 AM
Can someone help us Canadians out by looking at the RSS feed of Maclean's which is here (http://www.macleans.ca/rss/) and suggest where to start making a profile for it?

The Print option is a javascript that call up a custom print.css. So should we start by removing all the classes? Or try to force that print.css stylesheet as the default?

I might not be making much sense here..

PS: I'm not sure if Macleans can be done... I found out that even their RSS feed contain faulty links.

Rick C
04-22-2008, 08:22 AM
Hey Bubble-nice work,looks like you have a winner here with the Star.:2thumbsup
I've been too busy to look at libprs for a couple of weeks,and have not even looked at the new user manual you linked to.Actually I am really glad to see your post this morning and will have a closer look at how you made it work later on.I have tested what you did and it looks to work great.

I was really disappointed that the Google reader recipe from Davec seems to be effectively useless, because the concept was such a good one.If this program could be made to fetch from one of the RSS aggregators that is available on the net,it might make it a lot easier to use for those of us who don't have time or inclination to learn python programming.

Bubble
04-25-2008, 05:42 AM
Hmm...

Well...

I looked at the Globe and Mail and the same problem appear with Macleans. They don't have a HTML-link Print output, but rather, just popup the Print menu of Windows.

If anyone want to give us a clue on how to integrate these type of sites for libprs500, please reply.

Rick C
04-25-2008, 06:04 AM
Deputy Dawg had a quick look at these a while back and said as much;that Macleans and the Globe would not be very conducive to our purposes.
So we 've got the Star(you might want to post it to Kovid's site-he might include it in an iteration of libprs500 for future Canuck users),what other distinctly Canadian feeds can you try,Bubble?

Bubble
04-25-2008, 06:20 AM
I can't think of any at the moment Rick. Those 3 are basically the foundation when it comes to Canadian news.

PS - T'is pRogz ROX TkissHX kovidgoyal! Grtz on WEDING!

kovidgoyal
04-25-2008, 12:11 PM
Doesn't the globe and mail have a print edition at http://www.theglobeandmail.com/frontpage/

Bubble
04-26-2008, 12:30 AM
Doesn't the globe and mail have a print edition at http://www.theglobeandmail.com/frontpage/

I'm assuming you meant Print Edition as in online format that is downloadable? As far as I know, no.

kovidgoyal
04-26-2008, 03:59 AM
I meant an online edition that is typically an index that links to simplified HTML pages that are suitable for printing and therefore suitale for conversion to an ebook.

Ben_B
04-26-2008, 10:54 AM
Here is the profile I use for the Globe and Mail. It worked splendidly until version 0.4.49. I tried to revert back to version 0.4.42, but am having problems with a 'unicode' object has no attribute 'needs_subscription' traceback error.

P.S.-- looks like this was a known bug in 0.4.42. Does anyone know where I can find a pre 0.4.49 (windows) version of libprs500?

kovidgoyal
04-27-2008, 07:12 PM
Migrating a profile to work with 49 should be simple, see http://libprs500.kovidgoyal.net/user_manual/news.html#migrating-old-style-profiles-to-recipese

If you cant do it, I'll take a look.

Ben_B
04-28-2008, 01:37 AM
Thanks... I wasn't aware that this changed. This may take me awhile as I learn how to write "recipes". Tried making some quick changes using the new recipe format (BasicNewsRecipe), but I must be doing something wrong as I consistently receive the following error...

IndexError: list index out of range
Failed to perform job: Fetch news from The Globe and Mail
Detailed traceback:
Traceback (most recent call last):
File "parallel.py", line 139, in run_job
File "libprs500\ebooks\lrf\feeds\convert_from.pyo", line 40, in main
File "libprs500\web\feeds\main.pyo", line 134, in run_recipe
File "libprs500\web\feeds\news.pyo", line 466, in download
File "libprs500\web\feeds\news.pyo", line 603, in build_index
File "d:\temp\libprs500_0.4.49_r_7fws_recipes\recipe0.py", line 39, in print_version
IndexError: list index out of range

Bubble
05-03-2008, 11:01 PM
Hope you guys updated to the newest version! Globe n Mail is now supported in calibre. I have not looked at it in details yet however due to other priorities.

Thanks kovidgoyal.

moneytoo
05-08-2008, 03:46 PM
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe8 in position 113: ordinal not in range(128)
Failed to perform job: Fetch news from Reuters
Detailed traceback:
Traceback (most recent call last):
File "parallel.py", line 139, in run_job
File "calibre\ebooks\lrf\feeds\convert_from.pyo", line 40, in main
File "calibre\web\feeds\main.pyo", line 128, in run_recipe
File "calibre\web\feeds\news.pyo", line 810, in __init__
File "calibre\ebooks\lrf\web\profiles\__init__.pyo", line 174, in __init__
File "calibre\ebooks\lrf\web\profiles\__init__.pyo", line 225, in build_index
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe8 in position 113: ordinal not in range(128)
Log:
Fetching feeds...

I cannot convert single news feed using calibre GUI nor web2lrf. Every time I get this UnicodeDecodeError no matter what site it parses.

kovidgoyal
05-08-2008, 05:27 PM
Try the next release, it has a possible fix for this. It should be out in a couple of days.

Rick C
05-09-2008, 12:48 AM
I have been using v4.51 for a couple of days and the Globe feed is working well for me, athough it only retrieves the first page of any given story.

kovidgoyal
05-09-2008, 01:17 PM
That's probably because it needs a subscription, which I don't have. I actually wrote that recipe as a guide for Bubble, in the hopes he'd improve it and share the result.

Bubble
05-10-2008, 12:48 AM
I notice that too Rick C when I finally got around to test it.

The link that I had for Globe and Mail profile is broken (from private message). The online helpfile (http://calibre.kovidgoyal.net/user_manual/news.html) for web2lrf also point to a broken (http://calibre.kovidgoyal.net/browser/trunk/src/calibre/web/feeds/recipes) link when attempting to browse the default profiles. When you have the time, could you please take a look at it kovidgoyal?

I still have a faint image of the profile when I first saw it. To be honest, the codes are way above my understanding at this point in time. As such, I doubt I can tweak it to perfection... But maybe Ben_B can?

kovidgoyal
05-10-2008, 01:18 AM
Fixed the links.

Ben_B
05-22-2008, 01:31 AM
As for the links to the full stories from the Globe and Mail, I was using the following function to retrieve the full stories from the Globe Investor web site in the profile I posted earlier. The Globe Investor produces a very nice printed version without any extra HTML. I was using the function to created printed versions of the news stories from the Globe and Mail RSS feeds (i.e., http://www.theglobeandmail.com/generated/rss/BN/Front.xml).

def print_version(self, url):
return 'http://www.globeinvestor.com/servlet/ArticleNews/print/' + (url.split('/story/',1)[1]).split('.',1)[0] + '/' + url.rsplit('.',3)[2] + '/' + url.rsplit('.',3)[3]

The problem I ran into is that most of the full stories are contained within the tag <feedburner:origLink>. With the old libprs500, I was usng url_search_order = ['feedburner:origlink']. This seemed to work; however, this variable no longer seems to exist in Calibre's Basic News Recipe. I can't seem to figure out how to make Calibre follow the links contained within the <feedburner:origLink> tags. I'm guessing I will need to process this somehow through another function?

kovidgoyal
05-22-2008, 11:44 AM
Yeah


def get_article_url(self, article):
return article.get('feedburner_origlink', None)

Ben_B
05-23-2008, 02:41 PM
Here is my personal profile for the Globe and Mail I use for my PRS-505. I'm not a coder so there is probably plenty of room for improvement. The only problem I have is that I cannot change the text size while viewing it on the Reader. When opening the e-book file, the Reader defaults to S sized text. Attempting to change the size to M or L causes my Reader to crash and restart. My firmware is ver. 1.0.00.08130.

import re

from calibre.web.feeds.news import BasicNewsRecipe

class GlobeMail(BasicNewsRecipe):

title = 'The Globe and Mail'
html_description = False
use_pubdate = True
oldest_article = 7
use_embedded_content = False
max_articles_per_feed = 10
simultaneous_downloads = 1
no_stylesheets = True
summary_length = 300
html2lrf_options = ['--base-font-size', '9']

preprocess_regexps = [

(re.compile(r'<script.*?</script>', re.IGNORECASE | re.DOTALL), lambda match : ' '),
(re.compile(r'<style.*?</style>', re.IGNORECASE | re.DOTALL), lambda match : '<style> </style>'),
(re.compile(r'<body class="subscribe.*?<div id="articleAbstract">', re.IGNORECASE | re.DOTALL), lambda match : '<body><div>'),
(re.compile(r'<ul class="columnistInfo">.*?</ul>', re.IGNORECASE | re.DOTALL), lambda match : ''),
(re.compile(r'<p class="note".*?</body>', re.IGNORECASE | re.DOTALL), lambda match : '<br><br>Subscription required to read full story</body>'),
(re.compile(r'<p class="deck"></p>', re.IGNORECASE | re.DOTALL), lambda match : ' '),
(re.compile(r'<p class="byline"></p>', re.IGNORECASE | re.DOTALL), lambda match : ' '),
(re.compile(r'<p class="date"></p>', re.IGNORECASE | re.DOTALL), lambda match : ' '),
(re.compile(r'<p><a href="http://www.globeinvestor.com/">.*?<h2', re.IGNORECASE | re.DOTALL), lambda match : '<h2'),
(re.compile(r'<h1 class="keyline">.*?</h1>', re.IGNORECASE | re.DOTALL), lambda match : ' '),
(re.compile(r'<p class="date">.*?<(\S+)>', re.IGNORECASE | re.DOTALL), lambda match : match.group().replace(match.group(1), '/p><br') ),
(re.compile(r'<a href.*? target="offsite">', re.IGNORECASE | re.DOTALL), lambda match : '<a name="#">'),
(re.compile(r'<tr>', re.IGNORECASE | re.DOTALL), lambda match : '<br>'),
(re.compile(r'<td>', re.IGNORECASE | re.DOTALL), lambda match : ' '),
(re.compile(r'</tr>', re.IGNORECASE | re.DOTALL), lambda match : ' '),
(re.compile(r'</td>', re.IGNORECASE | re.DOTALL), lambda match : ' '),
(re.compile(r'<hr>', re.IGNORECASE | re.DOTALL), lambda match : ' '),
(re.compile(r'<!-- /frag.../copyright begins -->', re.IGNORECASE | re.DOTALL), lambda match : '<br><!-- /frag.../copyright begins --><br>'),
]

def get_article_url(self, article):
return article.get('feedburner_origlink', article.link)

def print_version(self, url):
return 'http://www.globeinvestor.com/servlet/ArticleNews/print/' + (url.split('/story/',1)[1]).split('.',1)[0] + '/' + url.rsplit('.',3)[2] + '/' + url.rsplit('.',3)[3]

def get_feeds(self):
return [
(' A. Front Page', 'http://www.theglobeandmail.com/generated/rss/BN/Front.xml'),
(' B. British Columbia', 'http://www.theglobeandmail.com/generated/rss/BN/HYBritishColumbia.xml'),
(' C. National', 'http://www.theglobeandmail.com/generated/rss/BN/National.xml'),
(' D. World', 'http://www.theglobeandmail.com/generated/rss/BN/International.xml'),
(' E. Americas', 'http://www.theglobeandmail.com/generated/rss/BN/HYAmerica.xml'),
(' F. Report on Business', 'http://www.theglobeandmail.com/generated/rss/BN/Business.xml'),
(' G. Energy News', 'http://www.theglobeandmail.com/generated/rss/BN/energy.xml'),
(' H. Your Money', 'http://www.theglobeandmail.com/generated/rss/BN/SpecialEvents2.xml'),
(' I. Sports', 'http://www.theglobeandmail.com/generated/rss/BN/Sports.xml'),
(' J. The Arts', 'http://www.theglobeandmail.com/generated/rss/BN/Entertainment.xml'),
(' K. Movies', 'http://www.theglobeandmail.com/generated/rss/BN/HYMovies.xml'),
(' L. Music', 'http://www.theglobeandmail.com/generated/rss/BN/HYMusic.xml'),
(' M. Technology', 'http://www.theglobeandmail.com/generated/rss/BN/Technology.xml'),
(' N. Science', 'http://www.theglobeandmail.com/generated/rss/BN/Science.xml'),
(' O. Life', 'http://www.theglobeandmail.com/generated/rss/BN/lifeMain.xml'),
(' P. Food & Wine', 'http://www.theglobeandmail.com/generated/rss/BN/lifeFoodWine.xml'),
(' Q. Travel', 'http://www.theglobeandmail.com/generated/rss/BN/specialTravel.xml'),
(' R. Health', 'http://www.theglobeandmail.com/generated/rss/BN/specialScienceandHealth.xml'),
]

kovidgoyal
05-23-2008, 02:50 PM
yeah the font size thing is a bug in SONY's firmware, which hopefully they will fix. Are the articles the full length ones? Or do you need a subscription for that?

Ben_B
05-23-2008, 03:19 PM
I'd say at least 90% of the articles are full-length. Most of the subscription articles are movie or restaurant reviews. I did a quick review of the articles I downloaded this morning...

A Front Page = 9/9 are full length
B British Columbia = 8/10 full length
C National = 10/10 full length
D World = 10/10 full length
E Americas = 10/10 full length

I didn't go through the rest, but I do recall seeing a couple more subscription articles under Movies.

moneytoo
05-30-2008, 08:18 AM
I have waited few weeks and downloaded latest version of calibre today. Just tried fetching few feeds but most of them just doesnt work...

Associated Press UnicodeDecodeError
The Atlantic OK
The BBC OK
Business Week URLError
CNN UnicodeDecodeError
Christian Science Monitor UnicodeDecodeError
Die Zeit Nachrichten UnicodeDecodeError
The Economist OK
FAZ NET UnicodeDecodeError
Globe and Mail OK
Jerusalem Post UnicodeDecodeError
Jutarnji UnicodeDecodeError
NASA UnicodeDecodeError
New York Review of Books UnicodeDecodeError
The New Yorker UnicodeDecodeError
Newsweek OK
Outlook Inida OK
Portfolio OK
Reuters UnicodeDecodeError
Spiegel Online UnicodeDecodeError
Syndey Morning Herald OK
USA Today OK
United Press International UnicodeDecodeError
Washington Post UnicodeDecodeError
Wired.com OK

Unfortunately I still have difficulties converting sites using web2lrf... :blink:

c:\Program Files\calibre>web2lrf -u http://www.mobilmania.mobi -r 1 default
Downloading
. . .Could not fetch stylesheet http://klub.zive.cz/passport/ /Client.StyleSheet
s/common.css
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . .

http://www.mobilmania.mobi saved to c:\docume~1\marcel~1\locals~1\temp\calibre_w
seyry_web2lrf\index.html
Traceback (most recent call last):
File "convert_from.py", line 182, in <module>
File "convert_from.py", line 176, in main
File "convert_from.py", line 146, in process_profile
File "ntpath.pyo", line 102, in join
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe1 in position 19: ordinal
not in range(128)

kovidgoyal
05-30-2008, 11:34 AM
I assume you're using a localized (non-english) version of windows?

moneytoo
05-30-2008, 03:08 PM
Yes, I'm using Windows XP Professional SP2 (Czech language). Same experience on several machines.

kovidgoyal
05-30-2008, 03:23 PM
Ah ok i'll add some possible fixes in 0.4.65

moneytoo
05-31-2008, 11:59 AM
Thanks, kovidgoyal. Web2lrf now works with the sample site. Great.

Fetching news using calibre GUI, I still get few errors (but most of the sites work now).

Washington Post, Reuters...:
'NoneType' object is unsubscriptable
Detailed traceback:
Traceback (most recent call last):
File "main.py", line 716, in news_fetched
File "main.py", line 442, in _add_books
File "calibre\ebooks\metadata\meta.pyo", line 81, in get_metadata
File "calibre\ebooks\metadata\__init__.pyo", line 96, in smart_update
TypeError: 'NoneType' object is unsubscriptable

kovidgoyal
05-31-2008, 12:50 PM
0.4.66 should have a fix that fixes that error

moneytoo
05-31-2008, 02:36 PM
:thanks: It works. Now's time for me to create my user profiles.

I was following this example: http://calibre.kovidgoyal.net/wiki/UserProfiles but I've spent some time before I realized I have to use
"from calibre.ebooks.lrf.web.profiles import DefaultProfile" instead of "from libprs500.ebooks.lrf.web.profiles import DefaultProfile".

kovidgoyal
05-31-2008, 02:45 PM
Use this http://calibre.kovidgoyal.net/user_manual/news.html

to learn how to create recipes for feeds

jotheman
06-02-2008, 12:25 PM
Kovid,

is it possible to use calibre / web2lrf for converting single webistes? I mean just websites, not newsfeeds. When I save a website to disk the pictures won't be saved. Calibre does a great job at fetching RSS feeds, so just fetching one website should be easy right? How could I do this?

Thanks,


jo.

kovidgoyal
06-02-2008, 02:29 PM
search for the bookit firefox extension

jotheman
06-02-2008, 04:52 PM
Thanks for your always quick and helpful answers, Kovid.

I already installed the BookIt extension. I like the idea of being able to just "browse & click" for new stuff that I then can comfortably read on an eReader.

I even started considering a Sony Reader, mostly because of your work on calibre and the broad support for this device that seems to be out there - but one big disadvantage held me back: support for Chinese fonts. It is possible, but very cumbersome.

So I will stick to my decision to buy a Hanlin for once. Still, it seems like I need to figure out how to work with LRF files, because that seems to be the most available format, at least Mac-software-wise. I think BookIt has an "add to calibre" feature so that I could work from there. Calibre would have to be the pivotal point for conversion.

Hoping for the best,

Cheers


jo.

MiketheMan
07-09-2008, 12:11 PM
Hi, Kovid! Many thanks for your work!
Can you kindly help me with an issue with web2rlf?

When I run

web2lrf.exe default --url="http://spinet.ru/conference/printview.php?t=2154&start=0" -o 1.lrf --match-regexp kafadzy.com

program types : "could not fetch link http://kafadzy.com/...", although I can open this link in a browser.

Thanks in advance,
Mikhail

kovidgoyal
07-09-2008, 12:16 PM
Add the --verbose option and run it

MiketheMan
07-09-2008, 03:16 PM
Can't make it output to a file with --verbose option; it prints something like that:

[DEBUG] __init__.pyo:489: fetching http://kafadzy.com/spine/images/skrew_wrong.jpg
[warning] __init__.pyo:489: could not fetch link http://kafadzy.com/spine/images/skrew_wrong.jpg
[DEBUG] __init__.pyo:489:error: decode() argument 1 must be string, not None
Traceback (most recent call last):
File "...simple.py", line 335, in process_link
File "..__init__.pyo", line 62, in xml_to_unicode
Typeerror:decode() argument 1 must be string, not None

Hope it'll help

Mikhail

kovidgoyal
07-09-2008, 04:54 PM
Will be fixed in the next release.

tovbrog
07-15-2008, 07:32 AM
Kovidgoyal,

I want to add my thanks to all the other Sony users for your wonderfule work.

Calibre has been a real godsend and I hardly use the Sony SW at all.

Keep up the great work.

capture
07-22-2008, 02:58 AM
I still have the UnicodeDecodeError problem, I use windows xp pro(chinese version) sp2, and just updated my calibre to v0.4.79.

....
Traceback (most recent call last):
File "main.py", line 151, in <module>
File "main.py", line 146, in main
File "main.py", line 134, in run_recipe
File "calibre\web\feeds\news.pyo", line 473, in download
File "calibre\web\feeds\news.pyo", line 646, in build_index
File "calibre\web\feeds\news.pyo", line 538, in feed2index
File "calibre\utils\genshi\core.pyo", line 179, in render
File "calibre\utils\genshi\output.pyo", line 60, in encode
File "calibre\utils\genshi\output.pyo", line 210, in __call__
File "calibre\utils\genshi\output.pyo", line 753, in __call__
File "calibre\utils\genshi\output.pyo", line 592, in __call__
File "calibre\utils\genshi\output.pyo", line 710, in __call__
File "calibre\utils\genshi\core.pyo", line 494, in escape
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 2: ordinal
not in range(128)

kovidgoyal
07-22-2008, 05:58 PM
yeah i haven;t fixed that bug

seajewel
07-23-2008, 02:08 PM
(somewhat cross-posted with the Format Conversion forum, but for good reason i think) Is this the tool I should use to convert a webpage with .html in it to .lrf? There was a fanfic that i wanted to read on my 505, and I just right clicked on a link to the page and saved as .html and then used calibre to convert to lrf. Unfortunately, the resulting file had improper margins (really large, not the way I've set calibre to convert) and even worse, it has page turns of 2-3 seconds as opposed to more like 0.5. I usually have to hit page turn an entire paragraph before I'm done to get the timing right, and it usually isn't quite right, which is frustrating..

But this tool seems to be for news sites. Will it work for just one big long .html webpage and if so, do I use calibre or some kind of cmd interface? Any tips would be appreciated. Or is the html file itself somehow problematic, such that I can't fix the long page turns without just copy and pasting it to .rtf or something?

kovidgoyal
07-23-2008, 02:19 PM
use the bookit firefox extension to convert webpages (it calls web2lrf) web2lrf in turn calls html2lrf

seajewel
07-23-2008, 02:27 PM
:thanks:Thank you, Kovid. Itching to test it out now, too bad I'll be at work for the next 12 hours or so today.. (projects, projects..)

use the bookit firefox extension to convert webpages (it calls web2lrf) web2lrf in turn calls html2lrf

banjopicker
08-08-2008, 01:41 AM
I use Calibre GUI for Windows regularly to download feeds. It is wonderful software. Yesterday I decided to set up some batch files using the command line interface so that I could schedule downloads to be ready every morning.

The line in the batchfile looks like this:
"C:\Program Files\calibre\web2lrf" nytimes --username user --password pass

but when I opened the .lrf up in my 505, the file had the correct table of contents, but none of the navigation links at the top and bottom of each article. Is there some flag I need to set to have the navigation links added to the file? I couldn't find a mention of it in the web2lrf docs.

Thanks again Kovid, this software has added a lot of value to my Reader.

kovidgoyal
08-08-2008, 01:44 AM
The command is feeds2lrf not web2lrf

banjopicker
08-08-2008, 04:39 AM
The command is feeds2lrf not web2lrf

:smack: That would be the problem then...

Thanks

jessie102
08-10-2008, 04:04 AM
first of all, coming from a PDA WM6 world, I'm must admit I'm a complete newbie on this e-boook reader field, I have found myself almost completely lost after purchasing my prs505 today.

After updating to the latest firmware, I downloaded geekravers Web2Book, BBC news site was all I was able to get working. So I proceeded to Calibre.

I was very impressed with the commercial looking layout and everything looked straight forward. However, even with the present recipes, for example Wired, I get failed downloads as pasted below:

DEBUG: Skipping article Alt Text: Grading Batman's Gear (Wed, 16 Jul, 2008 00:00) from feed Commentary as it is too old.

DEBUG: Skipping article Games Without Frontiers: Go Ahead, Punk, Make Your Game (Sun, 13 Jul, 2008 21:00) from feed Commentary as it is too old.

ERROR: Failed to download article: Russia's 'Full Scale Invasion' of Georgia from http://feeds.wired.com/~r/wired/index/~3/360690702/georgia-latest.html


DEBUG:

DEBUG: Traceback (most recent call last):
File "calibre\utils\threadpool.pyo", line 96, in run
File "calibre\web\feeds\news.pyo", line 572, in fetch_embedded_article
File "calibre\utils\genshi\core.pyo", line 179, in render
File "calibre\utils\genshi\output.pyo", line 60, in encode
File "calibre\utils\genshi\output.pyo", line 425, in __call__
File "calibre\utils\genshi\output.pyo", line 653, in __call__
IndexError: pop from empty list


DEBUG:


ERROR: Failed to download article: DIY Filmmaker Wins Big With Midnight Kiss from http://feeds.wired.com/~r/wired/index/~3/360695518/diy-filmmaker-w.html


DEBUG:

DEBUG: Traceback (most recent call last):
File "calibre\utils\threadpool.pyo", line 96, in run
File "calibre\web\feeds\news.pyo", line 572, in fetch_embedded_article
File "calibre\utils\genshi\core.pyo", line 179, in render
File "calibre\utils\genshi\output.pyo", line 60, in encode
File "calibre\utils\genshi\output.pyo", line 425, in __call__
File "calibre\utils\genshi\output.pyo", line 653, in __call__
IndexError: pop from empty list


DEBUG:


ERROR: Failed to download article: UAVs Search for Scientific Silver Lining in Beijing Pollution Clouds from http://feeds.wired.com/~r/wired/index/~3/360541969/uavs-search-for.html


DEBUG:

DEBUG: Traceback (most recent call last):
File "calibre\utils\threadpool.pyo", line 96, in run
File "calibre\web\feeds\news.pyo", line 572, in fetch_embedded_article
File "calibre\utils\genshi\core.pyo", line 179, in render
File "calibre\utils\genshi\output.pyo", line 60, in encode
File "calibre\utils\genshi\output.pyo", line 425, in __call__
File "calibre\utils\genshi\output.pyo", line 653, in __call__
IndexError: pop from empty list



Any Idea what could be causing this? Additionally, Slashdot RSS feeds turn out to be 99% comments using http://rss.slashdot.org/Slashdot/slashdot as feed. any help would be appreciated. :thanks::thanks:

stilliremain
09-10-2008, 07:41 AM
So what does the New York Times actually look like on the reader? How's it presented?