Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 01-14-2008, 08:20 PM   #1
GatorDeb
Evangelist
GatorDeb , Klaatu Barada Niktu!GatorDeb , Klaatu Barada Niktu!GatorDeb , Klaatu Barada Niktu!GatorDeb , Klaatu Barada Niktu!GatorDeb , Klaatu Barada Niktu!GatorDeb , Klaatu Barada Niktu!GatorDeb , Klaatu Barada Niktu!GatorDeb , Klaatu Barada Niktu!GatorDeb , Klaatu Barada Niktu!GatorDeb , Klaatu Barada Niktu!GatorDeb , Klaatu Barada Niktu!
 
Posts: 447
Karma: 5365
Join Date: Dec 2007
Location: Sin City
Device: PW2 + HDX 8.9
Share your LibPRS Python RSS Feeds and where to obtain more feed links!

I was able to download and use the CNN python feed and use it, and now I'm hooked. If you have created python scripts post them here! Also, what is a good site in order to get RSS feeds? I'm having the hardest of times finding XML URLs.
GatorDeb is offline   Reply With Quote
Old 03-03-2008, 06:12 PM   #2
moz
Addict
moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.
 
moz's Avatar
 
Posts: 370
Karma: 1553
Join Date: Feb 2008
Location: Melbun
Device: Kobo H2O
I have the other problem - lots of xml feeds, but I can't work out how to get LibPRS to import them. It uses the feed url but retrieves a blank html page. This for example: http://norightturn.blogspot.com/feeds/posts/default
moz is offline   Reply With Quote
Advert
Old 03-03-2008, 06:53 PM   #3
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,850
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
That's because it doesn't yet have explicit support for ATOM feeds, only RSS. You should open a ticket for support for ATOM.
kovidgoyal is offline   Reply With Quote
Old 03-03-2008, 07:14 PM   #4
moz
Addict
moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.
 
moz's Avatar
 
Posts: 370
Karma: 1553
Join Date: Feb 2008
Location: Melbun
Device: Kobo H2O
Quote:
Originally Posted by kovidgoyal View Post
That's because it doesn't yet have explicit support for ATOM feeds, only RSS. You should open a ticket for support for ATOM.
I've just done that now, thanks.

I'm also trying to scrape http://www.smh.com.au/text but web2disk hangs trying to parse it, I assume in the javascript at the end? LibPrs just returns the blank page that I've grown used to.
moz is offline   Reply With Quote
Old 03-03-2008, 07:52 PM   #5
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,850
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Use --verbose to see what link it is hanging on and then use --filter-regexps to disable fetching of that link. But really you should write a profile for SMH using the RSS feeds.
kovidgoyal is offline   Reply With Quote
Advert
Old 03-03-2008, 09:24 PM   #6
moz
Addict
moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.
 
moz's Avatar
 
Posts: 370
Karma: 1553
Join Date: Feb 2008
Location: Melbun
Device: Kobo H2O
OK, trying to write a profile, but really struggling. I get one of two things: a blank document, or the script hangs.

from libprs500.ebooks.lrf.web.profiles import DefaultProfile
import re

class SMH(DefaultProfile):

title = 'SMH'
max_recursions = 2
oldest_article = 1
no_stylesheets = True

preprocess_regexps = \
[ (re.compile(i[0], re.IGNORECASE | re.DOTALL), i[1]) for i in
[
# Remove links to homepage
(r'<P>[ <a href="/">SMH</a> ]</P>', lambda match : ''),
# and business pages
(r'<p><a href="http://business.smh.com.au.*', lambda match : ''),
]
]

def get_feeds(self):
return [ ('SMH', 'http://smh.com.au/text') ]
moz is offline   Reply With Quote
Old 03-03-2008, 10:21 PM   #7
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,850
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
get_feeds has to return a list of RSS feeds not a website. THat would be the list of feeds from here
http://www.smh.com.au/rsschannels/

For example
http://feeds.smh.com.au/rssheadlines/top.xml
kovidgoyal is offline   Reply With Quote
Old 03-03-2008, 11:24 PM   #8
moz
Addict
moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.
 
moz's Avatar
 
Posts: 370
Karma: 1553
Join Date: Feb 2008
Location: Melbun
Device: Kobo H2O
Thanks for your help Despite the incessant questions I really do appreciate it. The RSS feeds from the SMH are not what I want - that's why I'm trying to get the text version of the site. The text version is more complete and easier to de-format.

OK, what I want is this:
web2lrf --verbose --match-regexp=/text --url=http://smh.com.au/text --output=smh default
web2lrf --verbose --match-regexp=/text --url=http://theage.com.au/text --output=theage default

Yippee! I assume there's some way to add those to the news sources in the GUI, I'll look at that when I get home.
moz is offline   Reply With Quote
Old 03-04-2008, 07:37 PM   #9
moz
Addict
moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.
 
moz's Avatar
 
Posts: 370
Karma: 1553
Join Date: Feb 2008
Location: Melbun
Device: Kobo H2O
OK, another python regexp question: is there some way to say "ignore everything from the "<h3>Business</h3>" line to the "<H3>Columns</H3>" line? I spent last night reading instead of working on this, so I'm no further along.

Also, does bribery work with which features get implemented? And if so, how much?
moz is offline   Reply With Quote
Old 03-04-2008, 08:08 PM   #10
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,850
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Code:
re.compile('<h3>Business</h3>.*?<h3>Columns</h3>', re.DOTALL|re.IGNORECASE)
As for bribery, I'm always susceptible to the lure of lucre, but as for how much, that depends on what the feature is
kovidgoyal is offline   Reply With Quote
Old 03-04-2008, 08:51 PM   #11
moz
Addict
moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.
 
moz's Avatar
 
Posts: 370
Karma: 1553
Join Date: Feb 2008
Location: Melbun
Device: Kobo H2O
I'm eyeing up the ATOM blogs.

Having successfully ripped two newspapers that I read most days I'm now all excited and want to read blogs too. I'm also starting the big conversion of all the random books and articles I've downloaded over the years and that's going pretty well. Albeit slowly at times - am I right to assume that the lack of progress indication during conversions is because there's nothing coming back from the converter rather than because you want to taunt me? A 1000-page lrf file came out of one of the html conversions but it took a very long time (and cpu usage was minimal).

So yeah, libprf is proving very useful so I'm definitely going to donate, I'm just tempted to make it $100 instead of $20 in the hope that ATOM will arrive as magically as all the other help that I've received For comparison, now that I've got my "lying in the hammock reading and listening to music" process established, I've also ordered an 8GB memory stick for ~$US120. According to my housemates I look really, really dorky wandering round with a giant mp3 player but I don't care.
moz is offline   Reply With Quote
Old 03-04-2008, 09:00 PM   #12
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,850
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Weeel I probably shouldn't say this, but I'm in the process of refactoring web2lrf and the new version will support a whole bunch of feed formats, as well as sundry other improvements. And you shouldn't have a conversion that hangs around doing nothing for a long time.
kovidgoyal is offline   Reply With Quote
Old 03-05-2008, 02:26 AM   #13
moz
Addict
moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.
 
moz's Avatar
 
Posts: 370
Karma: 1553
Join Date: Feb 2008
Location: Melbun
Device: Kobo H2O
Ok, I'm playing with profiles and wondering how I actually create a profile for the command line web2lrf. Since the newspapers I want are html rather than feeds, I can't have them as news sources but I suspect I need to compile the profile before can use it?

I'm also not sure of the difference between user profiles and feed profiles as far as the command line goes.

Last edited by moz; 03-05-2008 at 02:30 AM.
moz is offline   Reply With Quote
Old 03-05-2008, 02:55 AM   #14
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,850
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
See https://libprs500.kovidgoyal.net/wiki/UserProfiles

An example of a profile that creates a feed from a website rather than a RSS feed is

https://libprs500.kovidgoyal.net/bro...es/atlantic.py
kovidgoyal is offline   Reply With Quote
Old 03-05-2008, 03:41 AM   #15
moz
Addict
moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.
 
moz's Avatar
 
Posts: 370
Karma: 1553
Join Date: Feb 2008
Location: Melbun
Device: Kobo H2O
Quote:
Originally Posted by kovidgoyal View Post
An example of a profile that creates a feed from a website rather than a RSS feed is The Atlantic
Aha, thanks for that. I've got it to the point of giving me a object not iterable error but I'm sure I will eventually puzzle that out. In the meantime I've dumped some error reports onto your website
Attached Files
File Type: zip smh.zip (730 Bytes, 547 views)
moz is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Is there a good way to convert partial rss to full rss feeds. Zorz Other formats 5 05-29-2010 12:17 PM
RSS Feed timezone Feedback 8 01-02-2010 06:55 PM
Calibre custom news feed and python help. harrynewman Calibre 4 10-08-2009 09:26 AM
RSS Feed questions rambling Calibre 2 11-20-2008 05:35 AM
RSS Feed Updates Alexander Turcic Announcements 0 06-11-2004 04:11 PM


All times are GMT -4. The time now is 11:33 AM.


MobileRead.com is a privately owned, operated and funded community.