Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > Other formats > LRF

Notices

Reply
 
Thread Tools Search this Thread
Old 12-02-2007, 02:58 PM   #91
StDo
Translating Calibre...
StDo plays well with othersStDo plays well with othersStDo plays well with othersStDo plays well with othersStDo plays well with othersStDo plays well with othersStDo plays well with othersStDo plays well with othersStDo plays well with othersStDo plays well with othersStDo plays well with others
 
StDo's Avatar
 
Posts: 657
Karma: 2902
Join Date: Aug 2007
Location: ER.de
Device: [PRS-500], PB360
That's it. Thanks.

By the way, how can I provide the skipping of an article without publication date?

Quote:
[DEBUG] __init__.pyo:172: Skipping article as it does not have publication date
[DEBUG] __init__.pyo:172: Skipping article as it does not have publication date
StDo is offline   Reply With Quote
Old 12-02-2007, 03:30 PM   #92
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,856
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
I'm not sure what you mean? You want to include articles that don't have a publication date? In that case, the only way to do it is to redefine the parse_feeds function in your profile.
kovidgoyal is offline   Reply With Quote
Advert
Old 12-02-2007, 03:50 PM   #93
StDo
Translating Calibre...
StDo plays well with othersStDo plays well with othersStDo plays well with othersStDo plays well with othersStDo plays well with othersStDo plays well with othersStDo plays well with othersStDo plays well with othersStDo plays well with othersStDo plays well with othersStDo plays well with others
 
StDo's Avatar
 
Posts: 657
Karma: 2902
Join Date: Aug 2007
Location: ER.de
Device: [PRS-500], PB360
Kovid, i tried to get the spiegelde.py running.

spiegelde.py:
Code:
from libprs500.ebooks.lrf.web.profiles import DefaultProfile

import re

class SpiegelOnline(DefaultProfile): 
    
    title = 'Spiegel Online' 
    timefmt = ' [ %Y-%m-%d %a]'
    max_recursions = 1
    max_articles_per_feed = 40
    html_description = True
    no_stylesheets = True

    
    def get_feeds(self): 
        return [ ('Spiegel Online', 'http://www.spiegel.de/schlagzeilen/rss/0,5291,,00.xml') ] 
    
    def print_version(self,url):
        tokens = url.split(',') 
        tokens[-2:-1] = ['-druck']
        return ','.join(tokens)

But the spiegel.de RSS feed shows the time format only as "Heute um 20:00 Uhr" (that means: "Today at 8 p.m.").

See: http://www.spiegel.de/schlagzeilen/rss/0,5291,,00.xml
StDo is offline   Reply With Quote
Old 12-02-2007, 03:56 PM   #94
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,856
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Then you will have to redefine the function strptime. The function takes a string argument and should return the number of seconds since the epoch (Jan 1 1970) in the GMT time zone.

something like

Code:
def strptime(self, src):
    # Some code to convert the string src into a datetime
    # This is a dummy implemetation that just returns the current time
    return time.time()

Last edited by kovidgoyal; 12-02-2007 at 04:00 PM.
kovidgoyal is offline   Reply With Quote
Old 12-02-2007, 04:51 PM   #95
StDo
Translating Calibre...
StDo plays well with othersStDo plays well with othersStDo plays well with othersStDo plays well with othersStDo plays well with othersStDo plays well with othersStDo plays well with othersStDo plays well with othersStDo plays well with othersStDo plays well with othersStDo plays well with others
 
StDo's Avatar
 
Posts: 657
Karma: 2902
Join Date: Aug 2007
Location: ER.de
Device: [PRS-500], PB360
Seems to be hard work, will try to config it in a few days...

Can't I tell web2lrf that it should take all articles shown, because there seems to be only roundabout 40-50 articles at spiegel.de
StDo is offline   Reply With Quote
Advert
Old 12-02-2007, 04:55 PM   #96
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,856
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Just define the dummy strptime function as show above and that will do this.
kovidgoyal is offline   Reply With Quote
Old 12-02-2007, 05:14 PM   #97
StDo
Translating Calibre...
StDo plays well with othersStDo plays well with othersStDo plays well with othersStDo plays well with othersStDo plays well with othersStDo plays well with othersStDo plays well with othersStDo plays well with othersStDo plays well with othersStDo plays well with othersStDo plays well with others
 
StDo's Avatar
 
Posts: 657
Karma: 2902
Join Date: Aug 2007
Location: ER.de
Device: [PRS-500], PB360
Sorry, getting the same error...

Code:
'''
Fetch Spiegel Online.
'''

from libprs500.ebooks.lrf.web.profiles import DefaultProfile

import re

class SpiegelOnline(DefaultProfile): 
    
    title = 'Spiegel Online' 
    timefmt = ' [ %Y-%m-%d %a]'
    max_recursions = 2
    max_articles_per_feed = 40
#    html_description = True
#    no_stylesheets = True

    
    def get_feeds(self): 
        return [ ('Spiegel Online', 'http://www.spiegel.de/schlagzeilen/rss/0,5291,,00.xml') ] 

    def strptime(self, src):
        # Some code to convert the string src into a datetime
        # This is a dummy implemetation that just returns the current time
        return time.time()
    
    def print_version(self,url):
        tokens = url.split(',') 
        tokens[-2:-1] = ['-druck']
        return ','.join(tokens)
StDo is offline   Reply With Quote
Old 12-02-2007, 05:28 PM   #98
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,856
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Ah I see that the feed has no publication date. OK. I've added a use_pubdate variable (in svn). Set it to False to prevent web2lrf from trying to figure out the publication date

Code:
use_pubdate = False
kovidgoyal is offline   Reply With Quote
Old 12-03-2007, 05:33 AM   #99
JTravers
Groupie
JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.
 
Posts: 182
Karma: 1078201
Join Date: Sep 2007
Device: iPad Air 2
Wall Street Journal

I have a profile setup for WSJ.com. I'm trying to get it configured to work with subscription content (only for those that have a valid paid subscription, of course).

The problem is that WSJ.com does not allow multiple, concurrent logins. If it detects multiple, concurrent logins, your account is subsequently locked until you call customer service.

So the 1st time I logged in through the web2lrf profile, everything worked and downloaded properly. However, every subsequent time I tried using the profile, the login didn't work (account was locked), so only non-subscription content was downloaded.

In order to prevent this, I believe one needs to log out of the site before exiting web2lrf. Is there way to logout of a site using web2lrf? Perhaps the same kind of functionality as the login, but it would be processed at the end of the process instead of the beginning.

This dilemma also applies to the Barrons.com site (since they are under the same umbrella as the WSJ.com). My profile for this only worked a couple times before I got locked out of the site.

Thanks for your help with this.
(.txt extension added to facilitate the upload)
Attached Files
File Type: txt wsj.py.txt (7.0 KB, 554 views)
File Type: txt barrons.py.txt (3.3 KB, 427 views)
JTravers is offline   Reply With Quote
Old 12-03-2007, 11:55 AM   #100
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,856
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
I've added a cleanup method to the profile that's called after the LRF file has been generated. You can use self.browser to logout in that method.
kovidgoyal is offline   Reply With Quote
Old 12-03-2007, 04:26 PM   #101
JTravers
Groupie
JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.
 
Posts: 182
Karma: 1078201
Join Date: Sep 2007
Device: iPad Air 2
Quote:
Originally Posted by kovidgoyal View Post
I've added a cleanup method to the profile that's called after the LRF file has been generated. You can use self.browser to logout in that method.
Thank you so much for adding this.

I'm going to need some help on the proper code to use, though, due to my ignorance of python.

Would adding something like this to my profile work?

Code:
        def cleanup(self): 
                return  [
                self.browser.open('http://online.barrons.com/logout') 
                ]
Thanks for your help with this.

One other question for you, if you don't mind. How do you add the --ignore-tables option to the profile, so you don't have to specify it on the command-line every time you use the profile?

Thanks again.
JTravers is offline   Reply With Quote
Old 12-03-2007, 05:12 PM   #102
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,856
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Yeah that should do it, no need to return anything though.

Use
Code:
html2lrf_options = ['--ignore-tables']
kovidgoyal is offline   Reply With Quote
Old 12-03-2007, 05:42 PM   #103
StDo
Translating Calibre...
StDo plays well with othersStDo plays well with othersStDo plays well with othersStDo plays well with othersStDo plays well with othersStDo plays well with othersStDo plays well with othersStDo plays well with othersStDo plays well with othersStDo plays well with othersStDo plays well with others
 
StDo's Avatar
 
Posts: 657
Karma: 2902
Join Date: Aug 2007
Location: ER.de
Device: [PRS-500], PB360
Quote:
Originally Posted by kovidgoyal View Post
Code:
def print_version(self,url):
    tokens = url.split(',') 
    tokens[-2:-1] = ['druck-']
    return ','.join(tokens)
Kovid,
that snippet you gave me replaces the numbers between the last comma and the second last comma with "druck-". But the numbers there should remain and "druck-" should be added in front of the numbers and after the second last comma.

The original link:
Code:
http://www.spiegel.de/panorama/justiz/0,1518,521183,00.html
should be
Code:
http://www.spiegel.de/panorama/justiz/0,1518,druck-521183,00.html
and not (as it will be done with the snippet above):
Code:
http://www.spiegel.de/panorama/justiz/0,1518,druck-,00.html
Thanks for thinking and coding.
StDo is offline   Reply With Quote
Old 12-03-2007, 07:03 PM   #104
JTravers
Groupie
JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.
 
Posts: 182
Karma: 1078201
Join Date: Sep 2007
Device: iPad Air 2
Quote:
Originally Posted by kovidgoyal View Post
Yeah that should do it, no need to return anything though.

Use
Code:
html2lrf_options = ['--ignore-tables']
When trying the cleanup code, web2lrf hangs right after generating the lrf. I used the following code:
Code:
        def cleanup(self): 
                self.browser.open('http://online.barrons.com/logout')
For Barron's, I have to set max recursions to 3 because there are some articles that are divided into two parts (even the print versions). Doing this, however, causes web2lrf to follow a bunch of other links which end up being garbage and taking it off the Barron's website. Is there a way to restrict the links that web2lrf follows? I've tried the following, but it didn't seem to work:

Code:
        match_regexps = ['<a.*?mod=.*?>']
and I also tried:
Code:
        match_regexps = ['<a.*?online.barrons.com.*?>']
It doesn't seem like either is having an effect. I know I'm probably misusing these options, so any guidance would be appreciated.

Finally, I tried using html2lrf_options before (and again now), and it doesn't seem to give the same output that is generated when specifying --ignore-tables on the command line. Not sure why.
JTravers is offline   Reply With Quote
Old 12-03-2007, 08:05 PM   #105
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,856
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
@StDo
Oops sorry. Here you go
Code:
def print_version(self,url):
    tokens = url.split(',')
    tokens[-2:-2] = ['druck|']
    return ','.join(tokens).replace('|,','-')
@JTravers
match_regexp works on the contents of the href attribute, i.e. the URL itself, not on the <a> tag. As for html2lrf_options, looks like a regression, they aren't being applied. Will be fixed in the next release.
Not sure why the cleanup code should hang, I'll look at that later.
kovidgoyal is offline   Reply With Quote
Reply

Tags
libprs500, web2lrf


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
web2lrf to capture blog archive? Deputy-Dawg Sony Reader Dev Corner 1 02-14-2008 11:41 PM
web2lrf: La Repubblica alexxxm Sony Reader 1 11-13-2007 12:27 PM


All times are GMT -4. The time now is 06:47 PM.


MobileRead.com is a privately owned, operated and funded community.