05-10-2010, 02:24 PM | #1906 |
Member
Posts: 14
Karma: 10
Join Date: Aug 2007
Location: Switzerland
Device: Kindle Voyage, Kobo
|
@kiklop74
yes, I'll try, thanks! |
05-10-2010, 03:57 PM | #1907 | |
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
See the append_page function and how it is used in preprocess_html. Most multipage recipes use the same basic procedure. |
|
Advert | |
|
05-10-2010, 04:06 PM | #1908 |
Guru
Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
|
In this case things are a bit different. Articles in Kommersant website are never multipage. Other pages contain related articles. For that reason I did not invest any time in implementing it.
|
05-10-2010, 04:14 PM | #1909 |
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Ah. I was confused by his comment that "there are links to page '2' .... and page '3' ..." I agree - I don't think it's worth implementing "related article" links.
|
05-10-2010, 10:16 PM | #1910 |
Groupie
Posts: 165
Karma: 206
Join Date: Dec 2007
Location: Kansas City
Device: Kindle1, Kindle DX, Kindle DXG
|
Looks like the recipe for The Nation was broken in the latest version. Any idea how to fix?
Thanks. |
Advert | |
|
05-10-2010, 10:47 PM | #1911 |
Member
Posts: 12
Karma: 10
Join Date: May 2010
Device: Nook
|
Cyanide & Happiness
Could someone please make a recipe for Cyanide & Happiness?
I have tried and miserably failed. The problem is that the RSS feed does not include any comics like xkcd does and I have not been able to find a feed that does. The website is http://www.explosm.net/comics/ and the RSS is: http://feeds.feedburner.com/Explosm I know someone requested this recipe yesterday in the wrong thread https://www.mobileread.com/forums/sho...nide+happiness but it did not seem to make it on here. Thanks! |
05-11-2010, 07:42 AM | #1912 |
Connoisseur
Posts: 55
Karma: 10
Join Date: Apr 2010
Location: new york city
Device: nook, ipad
|
Ok. after my original post requesting a recipe for The American Prospect (rss feed: http://www.prospect.org/articles_rss.jsp), I attempted (even though I have absolutely no idea what I'm doing) to set one up myself. But anything I've tried beyond the basic recipe has been useless - TAP is an independent publication, so I can't just modify, say, another conde nast publication.
When using the basic recipe maker, it picks up all of the article titles, but there's no content inside the articles (even though it "sees" the source article). I'd post something here as to my efforts, but I'm afraid that would be less than useful as anything I've tried has given me *less* content (if that's even possible) than the basic recipe. Help! |
05-12-2010, 11:55 AM | #1913 |
Junior Member
Posts: 2
Karma: 10
Join Date: Feb 2010
Device: PRS 300
|
Scinexx once again
Hi,
my old first attempt to get scinexx was of course quite wrong, but nobody complaint about that up to now anyways. However, to finish this, here's my actual version. Maybe somebody who knows s'thing about python might have a short look at it, to make things more smooth... LG Londo Code:
from calibre.web.feeds.news import BasicNewsRecipe class AdvancedUserRecipe1265145870(BasicNewsRecipe): title = u'Scinexx.de' language = 'de' __author__ = 'Recipe by JSuer' cover_url = 'http://www.g-o.de/grafiken/web_scinexx/head2.gif' oldest_article = 14 max_articles_per_feed = 100 no_stylesheets = True use_embedded_content = False encoding = 'ISO-8859-1' # encoding = 'utf-8' feeds = [(u'Scinexx.de', u'http://feeds.feedburner.com/scinexx')] remove_tags = [{'class':['text1fett']}] remove_tags = [{'href':['javascript:window.print()']}] extra_css = ''' .text2normal{font-family:Verdana,Geneva,Kalimati,sans-serif; font-size:x-small;} .text1normalblau{font-family:Verdana,Geneva,Kalimati,sans-serif; font-size:small;} .text1fett{font-color:grey; font-size:small;} .titel1{font-family:Georgia,"Times New Roman",Times,serif; font-size:large;} .titel2{font-family:Georgia,"Times New Roman",Times,serif; } .titel3{font-family:Georgia,"Times New Roman",Times,serif; font-size:larger;} h1{font-size:large;} ''' def print_version(self, url): id_start = url.rfind('2010') - 6 id_end = id_start + 5 id = url[id_start : id_end] result = 'http://www.scinexx.de/inc/artikel_drucken.php?id=' + id + '&a_flag=1' return result |
05-12-2010, 03:10 PM | #1914 |
Connoisseur
Posts: 98
Karma: 22
Join Date: Mar 2010
Device: IRiver Story, Ipod Touch, Android SmartPhone
|
New recipes
|
05-14-2010, 07:30 AM | #1915 |
Connoisseur
Posts: 98
Karma: 22
Join Date: Mar 2010
Device: IRiver Story, Ipod Touch, Android SmartPhone
|
New Recipes
|
05-14-2010, 10:49 AM | #1916 | |
award-winning bozo
Posts: 258
Karma: 172703
Join Date: Sep 2009
Location: Philadelphia
Device: Kobo Libra 2
|
Quote:
I just tried it as well - the one thing I'm seeing is that Calibre is writing an error message to my system log: "link hasn't been detected!" (note the two spaces between "link" and "hasn't") - I'll keep digging, see if I can figure out why. |
|
05-14-2010, 11:23 AM | #1917 |
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Kovid has stated on several occasions that the "link hasn't been detected!" message isn't an error. It's merely informational and can be safely ignored. I have seen that message repeated a few thousand times as I tested various bits of code.
|
05-14-2010, 12:54 PM | #1918 | |
award-winning bozo
Posts: 258
Karma: 172703
Join Date: Sep 2009
Location: Philadelphia
Device: Kobo Libra 2
|
Quote:
I think the problem is simply that the American Prospect generates truly awful HTML - the problem starts on the first line of the output where you find javascript before the <!DOCTYPE> tag, for one thing, but also <meta> tags inside the body, <scripts> inside <tr> elements and newlines inside URIs. They don't even identify parts of the page with IDs so there's no easy way to identify the part with the article in it. I was able to write a recipe that gets everything: Code:
class AdvancedUserRecipe1273850169(BasicNewsRecipe): title = u'American Prospect' oldest_article = 7 max_articles_per_feed = 100 recursions = 0 no_stylesheets = True feeds = [(u'Articles', u'feed://www.prospect.org/articles_rss.jsp')] |
|
05-14-2010, 01:53 PM | #1919 | |
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
|
|
05-14-2010, 01:58 PM | #1920 | |
award-winning bozo
Posts: 258
Karma: 172703
Join Date: Sep 2009
Location: Philadelphia
Device: Kobo Libra 2
|
Quote:
Code:
def preprocess_html(self, soup): for item in soup.body: print 'MHEINZ: [[[' print item print ']]] MHEINZ\n\n' return soup Overall, though, it looks like soup is parsing to a particular depth and then stopping - it looks like there's a vast blob of html that it is treating as a blob of text. |
|
Thread Tools | Search this Thread |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Custom column read ? | pchrist7 | Calibre | 2 | 10-04-2010 02:52 AM |
Archive for custom screensavers | sleeplessdave | Amazon Kindle | 1 | 07-07-2010 12:33 PM |
How to back up preferences and custom recipes? | greenapple | Calibre | 3 | 03-29-2010 05:08 AM |
Donations for Custom Recipes | ddavtian | Calibre | 5 | 01-23-2010 04:54 PM |
Help understanding custom recipes | andersent | Calibre | 0 | 12-17-2009 02:37 PM |