Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Closed Thread
 
Thread Tools Search this Thread
Old 05-10-2010, 03:24 PM   #1906
smargo
Member
smargo began at the beginning.
 
Posts: 13
Karma: 10
Join Date: Aug 2007
Location: Switzerland
Device: Kindle 2i, iPhone
@kiklop74

yes, I'll try, thanks!
smargo is offline  
Old 05-10-2010, 04:57 PM   #1907
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by smargo View Post
in the bottom of certain articles in Kommersant, there are links to the additional pages (of the same issue, they are not avaialble as links from rss). For example, on the page http://www.kommersant.ru/doc-rss.aspx?DocsID=1366511 there are links to page "2" - http://www.kommersant.ru/doc.aspx?DocsID=1366459 and page "3" - http://www.kommersant.ru/doc.aspx?DocsID=1366462. Is there any way to include these additional pages?
I see that you are going to try it yourself. The post two above your post has a sample multipage recipe for Discover Magazine.

See the append_page function and how it is used in preprocess_html. Most multipage recipes use the same basic procedure.
Starson17 is offline  
Old 05-10-2010, 05:06 PM   #1908
kiklop74
Guru
kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.
 
kiklop74's Avatar
 
Posts: 780
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle PaperWhite, Motorola Xoom
In this case things are a bit different. Articles in Kommersant website are never multipage. Other pages contain related articles. For that reason I did not invest any time in implementing it.
kiklop74 is offline  
Old 05-10-2010, 05:14 PM   #1909
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by kiklop74 View Post
In this case things are a bit different. Articles in Kommersant website are never multipage. Other pages contain related articles. For that reason I did not invest any time in implementing it.
Ah. I was confused by his comment that "there are links to page '2' .... and page '3' ..." I agree - I don't think it's worth implementing "related article" links.
Starson17 is offline  
Old 05-10-2010, 11:16 PM   #1910
elmoglick
Groupie
elmoglick doesn't litterelmoglick doesn't litterelmoglick doesn't litter
 
Posts: 165
Karma: 206
Join Date: Dec 2007
Location: Kansas City
Device: Kindle1, Kindle DX, Kindle DXG
Looks like the recipe for The Nation was broken in the latest version. Any idea how to fix?

Thanks.
elmoglick is offline  
Old 05-10-2010, 11:47 PM   #1911
nook.life
Member
nook.life began at the beginning.
 
Posts: 12
Karma: 10
Join Date: May 2010
Device: Nook
Cyanide & Happiness

Could someone please make a recipe for Cyanide & Happiness?

I have tried and miserably failed. The problem is that the RSS feed does not include any comics like xkcd does and I have not been able to find a feed that does.

The website is http://www.explosm.net/comics/
and the RSS is: http://feeds.feedburner.com/Explosm

I know someone requested this recipe yesterday in the wrong thread
http://www.mobileread.com/forums/sho...nide+happiness

but it did not seem to make it on here. Thanks!
nook.life is offline  
Old 05-11-2010, 08:42 AM   #1912
sdow1
Connoisseur
sdow1 began at the beginning.
 
Posts: 53
Karma: 10
Join Date: Apr 2010
Location: new york city
Device: nook, ipad
Ok. after my original post requesting a recipe for The American Prospect (rss feed: http://www.prospect.org/articles_rss.jsp), I attempted (even though I have absolutely no idea what I'm doing) to set one up myself. But anything I've tried beyond the basic recipe has been useless - TAP is an independent publication, so I can't just modify, say, another conde nast publication.

When using the basic recipe maker, it picks up all of the article titles, but there's no content inside the articles (even though it "sees" the source article).

I'd post something here as to my efforts, but I'm afraid that would be less than useful as anything I've tried has given me *less* content (if that's even possible) than the basic recipe.

Help!
sdow1 is offline  
Old 05-12-2010, 12:55 PM   #1913
LondoMolari
Junior Member
LondoMolari began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Feb 2010
Device: PRS 300
Scinexx once again

Hi,

my old first attempt to get scinexx was of course quite wrong,
but nobody complaint about that up to now anyways.

However, to finish this, here's my actual version. Maybe somebody who knows s'thing about python might have a short look at it, to make things more smooth...

LG Londo

Code:
from calibre.web.feeds.news import BasicNewsRecipe

class AdvancedUserRecipe1265145870(BasicNewsRecipe):
    title          = u'Scinexx.de'
    language = 'de'
    __author__ = 'Recipe by JSuer'
    cover_url    = 'http://www.g-o.de/grafiken/web_scinexx/head2.gif'
    oldest_article = 14
    max_articles_per_feed = 100
    no_stylesheets = True
    use_embedded_content  = False
    encoding = 'ISO-8859-1'
#    encoding = 'utf-8'

    feeds          = [(u'Scinexx.de', u'http://feeds.feedburner.com/scinexx')]

    remove_tags = [{'class':['text1fett']}]
    remove_tags = [{'href':['javascript:window.print()']}]

    extra_css = '''
                    .text2normal{font-family:Verdana,Geneva,Kalimati,sans-serif; font-size:x-small;}
                    .text1normalblau{font-family:Verdana,Geneva,Kalimati,sans-serif; font-size:small;}
                    .text1fett{font-color:grey; font-size:small;}
                    .titel1{font-family:Georgia,"Times New Roman",Times,serif; font-size:large;}
                    .titel2{font-family:Georgia,"Times New Roman",Times,serif; }
                    .titel3{font-family:Georgia,"Times New Roman",Times,serif; font-size:larger;}
                    h1{font-size:large;}
                    '''


    def print_version(self, url):
        id_start = url.rfind('2010') - 6
        id_end = id_start +  5
        id = url[id_start : id_end]
        result = 'http://www.scinexx.de/inc/artikel_drucken.php?id=' + id + '&a_flag=1'
        return result
LondoMolari is offline  
Old 05-12-2010, 04:10 PM   #1914
gambarini
Connoisseur
gambarini began at the beginning.
 
Posts: 98
Karma: 22
Join Date: Mar 2010
Device: IRiver Story, Ipod Touch, Android SmartPhone
New recipes

www.ansa.it
Italian News Agency

punto-informatico.it
Internet News
Attached Files
File Type: zip punto_informatico.zip (718 Bytes, 67 views)
File Type: zip Ansa.zip (1.2 KB, 58 views)
gambarini is offline  
Old 05-14-2010, 08:30 AM   #1915
gambarini
Connoisseur
gambarini began at the beginning.
 
Posts: 98
Karma: 22
Join Date: Mar 2010
Device: IRiver Story, Ipod Touch, Android SmartPhone
New Recipes

www.leggo.it

Italian Daily News Paper

www.apcom.it

Italian News Agency
Attached Files
File Type: zip APCOM.zip (808 Bytes, 56 views)
File Type: zip Leggo_it.zip (857 Bytes, 58 views)
gambarini is offline  
Old 05-14-2010, 11:49 AM   #1916
mwheinz
award-winning bozo
mwheinz can grok the meaning of the universe.mwheinz can grok the meaning of the universe.mwheinz can grok the meaning of the universe.mwheinz can grok the meaning of the universe.mwheinz can grok the meaning of the universe.mwheinz can grok the meaning of the universe.mwheinz can grok the meaning of the universe.mwheinz can grok the meaning of the universe.mwheinz can grok the meaning of the universe.mwheinz can grok the meaning of the universe.mwheinz can grok the meaning of the universe.
 
Posts: 252
Karma: 157113
Join Date: Sep 2009
Location: Philadelphia
Device: Sony PRS-600
Quote:
Originally Posted by sdow1 View Post
Ok. after my original post requesting a recipe for The American Prospect (rss feed: http://www.prospect.org/articles_rss.jsp), I attempted (even though I have absolutely no idea what I'm doing) to set one up myself. But anything I've tried beyond the basic recipe has been useless - TAP is an independent publication, so I can't just modify, say, another conde nast publication.

When using the basic recipe maker, it picks up all of the article titles, but there's no content inside the articles (even though it "sees" the source article).

I'd post something here as to my efforts, but I'm afraid that would be less than useful as anything I've tried has given me *less* content (if that's even possible) than the basic recipe.

Help!
sdow1,

I just tried it as well - the one thing I'm seeing is that Calibre is writing an error message to my system log: "link hasn't been detected!" (note the two spaces between "link" and "hasn't") - I'll keep digging, see if I can figure out why.
mwheinz is offline  
Old 05-14-2010, 12:23 PM   #1917
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by mwheinz View Post
sdow1,

I just tried it as well - the one thing I'm seeing is that Calibre is writing an error message to my system log: "link hasn't been detected!" (note the two spaces between "link" and "hasn't") - I'll keep digging, see if I can figure out why.
Kovid has stated on several occasions that the "link hasn't been detected!" message isn't an error. It's merely informational and can be safely ignored. I have seen that message repeated a few thousand times as I tested various bits of code.
Starson17 is offline  
Old 05-14-2010, 01:54 PM   #1918
mwheinz
award-winning bozo
mwheinz can grok the meaning of the universe.mwheinz can grok the meaning of the universe.mwheinz can grok the meaning of the universe.mwheinz can grok the meaning of the universe.mwheinz can grok the meaning of the universe.mwheinz can grok the meaning of the universe.mwheinz can grok the meaning of the universe.mwheinz can grok the meaning of the universe.mwheinz can grok the meaning of the universe.mwheinz can grok the meaning of the universe.mwheinz can grok the meaning of the universe.
 
Posts: 252
Karma: 157113
Join Date: Sep 2009
Location: Philadelphia
Device: Sony PRS-600
Quote:
Originally Posted by Starson17 View Post
Kovid has stated on several occasions that the "link hasn't been detected!" message isn't an error.
Yeah, I discovered it was unrelated to the message shortly after posting.

I think the problem is simply that the American Prospect generates truly awful HTML - the problem starts on the first line of the output where you find javascript before the <!DOCTYPE> tag, for one thing, but also <meta> tags inside the body, <scripts> inside <tr> elements and newlines inside URIs. They don't even identify parts of the page with IDs so there's no easy way to identify the part with the article in it.

I was able to write a recipe that gets everything:

Code:
class AdvancedUserRecipe1273850169(BasicNewsRecipe):
    title          = u'American Prospect'
    oldest_article = 7
    max_articles_per_feed = 100
    recursions = 0
    no_stylesheets = True

    feeds       = [(u'Articles', u'feed://www.prospect.org/articles_rss.jsp')]
but any attempt to remove certain tags (like the embedded advertisements) has no effect and telling it to keep certain tags (like the ones with the main articles) cause it to delete everything and generate an empty page.
mwheinz is offline  
Old 05-14-2010, 02:53 PM   #1919
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by mwheinz View Post
I think the problem is simply that the American Prospect generates truly awful HTML ..
but any attempt to remove certain tags (like the embedded advertisements) has no effect and telling it to keep certain tags (like the ones with the main articles) cause it to delete everything and generate an empty page.
Malformed html can be problematical. You may want to look at the soup output from preprocess_html and then use preprocess_regexps to delete material you need to get rid of.
Starson17 is offline  
Old 05-14-2010, 02:58 PM   #1920
mwheinz
award-winning bozo
mwheinz can grok the meaning of the universe.mwheinz can grok the meaning of the universe.mwheinz can grok the meaning of the universe.mwheinz can grok the meaning of the universe.mwheinz can grok the meaning of the universe.mwheinz can grok the meaning of the universe.mwheinz can grok the meaning of the universe.mwheinz can grok the meaning of the universe.mwheinz can grok the meaning of the universe.mwheinz can grok the meaning of the universe.mwheinz can grok the meaning of the universe.
 
Posts: 252
Karma: 157113
Join Date: Sep 2009
Location: Philadelphia
Device: Sony PRS-600
Quote:
Originally Posted by Starson17 View Post
Malformed html can be problematical. You may want to look at the soup output from preprocess_html and then use preprocess_regexps to delete material you need to get rid of.
Yeah - I've been trying traverse the soup with this:

Code:
   def preprocess_html(self, soup):
        for item in soup.body:
            print 'MHEINZ: [[['
            print item
            print ']]] MHEINZ\n\n'
        return soup
but the output I'm getting is weird - as iff it was processing multiple items at once (while I'm comfortable in various C dialects, I am not a python coder). I'm seeing things like multiple "[[[" lines in a row before a "]]]" line.

Overall, though, it looks like soup is parsing to a particular depth and then stopping - it looks like there's a vast blob of html that it is treating as a blob of text.
mwheinz is offline  
Closed Thread

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Custom column read ? pchrist7 Calibre 2 10-04-2010 03:52 AM
Archive for custom screensavers sleeplessdave Amazon Kindle 1 07-07-2010 01:33 PM
How to back up preferences and custom recipes? greenapple Calibre 3 03-29-2010 06:08 AM
Donations for Custom Recipes ddavtian Calibre 5 01-23-2010 05:54 PM
Help understanding custom recipes andersent Calibre 0 12-17-2009 03:37 PM


All times are GMT -4. The time now is 05:14 AM.


MobileRead.com is a privately owned, operated and funded community.