Custom recipes (archive, read-only) - Page 146

robandcurtis · 06-21-2010, 10:17 AM

Quote:

Originally Posted by rty

Here it is. Recipe for London Free Press (Canada).

Hey that was fast. Works like a charm.

rty · 06-21-2010, 12:04 PM

Recipe for People's Daily (in Chinese)

rty · 06-21-2010, 12:11 PM

Quote:

Originally Posted by Starson17

you still haven't used append_page. Add preprocess_html the way that it's used in AG.

Help Starson ....please. Another multipage issue. I encountered another website that has multipage articles and the next page is linked via an image (button image) as follows:

Code:

<a href="/GB/1027/11928295.html">
<img src="/img/next_b.gif" border="0">
</a>

Please look at the codes below (click on the Show button) that I modified from AG to combine the pages.

Here I was trying to find the image having src='/img/next_b.gif' and then grab the href for the URL but it doesn't seem to work. What did I do wrong? Help please?

Spoiler:

rford · 06-21-2010, 01:04 PM

I have a custom recipe to download all my favorite comic strips. Similar to the xkcd.recipe.

The one thing I found annoying was the image in the epub were too wide and were getting cut off. So I rotated them. Now long 3 and 4 panel strips are landscape.

here is the code snippet that I used to rotate the images. Hopefully others will find it useful.

Code:

import calibre.utils.PythonMagickWand as pw

Code:

    def postprocess_html(self, soup, first):
        #process all the images. assumes that the new html has the correct path
        for tag in soup.findAll(lambda tag: tag.name.lower()=='img' and tag.has_key('src')):
            iurl = tag['src']
            print 'resizing image' + iurl
            with pw.ImageMagick():
                img = pw.NewMagickWand()
                p = pw.NewPixelWand()
                if img < 0:
                    raise RuntimeError('Out of memory')
                if not pw.MagickReadImage(img, iurl):
                    severity = pw.ExceptionType(0)
                    msg = pw.MagickGetException(img, byref(severity))
                    raise IOError('Failed to read image from: %s: %s'
                        %(iurl, msg))
                
                width = pw.MagickGetImageWidth(img)
                height = pw.MagickGetImageHeight(img)

                if( width > height ) :
                    print 'Rotate image'
                    pw.MagickRotateImage(img, p, 90)

                if not pw.MagickWriteImage(img, iurl):
                    raise RuntimeError('Failed to save image to %s'%iurl)
                pw.DestroyMagickWand(img)


        return soup

gambarini · 06-21-2010, 02:28 PM

http://www.ilsole24ore.com/rss/primapagina.xml

Any ideas with this feed?
The correct link is not under "guid", nor "link" or "links" tag.

Starson17 · 06-21-2010, 03:30 PM

Quote:

Originally Posted by rford

here is the code snippet that I used to rotate the images. Hopefully others will find it useful.

Thanks! I don't want to rotate images, but I have cases where I'd like to compare image height to width. This will be useful.

bhandarisaurabh · 06-21-2010, 09:50 PM

Quote:

Originally Posted by rty

Look at the RSS page provided by Forbes India: http://business.in.com/rss/

As I mentioned, the recipe picks up articles from the feed called "Complete Business.in.com" http://business.in.com/rssfeed/rss_all.xml

Anything that is not included by Forbes India in this particular feed, there's nothing I can do about it. Maybe you can write to Forbes India to ask them to include all the articles of the latest issue in the RSS feed page and see if they care.

OKIE THANKS FOR THE HELP

mlstein · 06-22-2010, 10:19 AM

A second request for subscriber content for the London Review of books, http://www.lrb.co.uk. Anyone?

gambarini · 06-22-2010, 02:22 PM

With this feed i have tried two ways, and every one has is pro and cons...

With get.article i can obtain the correct link, but i can't find the title of the article.
With the parse_index ( index_to_soup) i can find the correct "title" but i don't get the link (in the soup there is a malformed "link" tag)
an example of index to soup

Spoiler:

So

is there the possibility to use both solutions together?
Or is there the possibility to extract the link near the malformet tag <link /> ???

p.s.

probably the bug is related to the feed

Spoiler:

rty · 06-22-2010, 02:46 PM

Recipe for China Press USA (in Chinese)

Tested OK on B&N Nook.

gambarini · 06-22-2010, 03:12 PM

Quote:

Originally Posted by gambarini

With this feed i have tried two ways, and every one has is pro and cons...

With get.article i can obtain the correct link, but i can't find the title of the article.
With the parse_index ( index_to_soup) i can find the correct "title" but i don't get the link (in the soup there is a malformed "link" tag)
an example of index to soup

Spoiler:

So

is there the possibility to use both solutions together?
Or is there the possibility to extract the link near the malformet tag <link /> ???

p.s.

probably the bug is related to the feed

Spoiler:

ok, this is my solution; i don't use the feed but i try to obtain link directly from the html section of the site.
So this is the code (beta version

)

Spoiler:

rty · 06-23-2010, 12:12 PM

Recipe for ifzm China Southern Weekly (in Chinese)

Tested OK on B&N Nook

kiklop74 · 06-23-2010, 12:40 PM

Quote:

Originally Posted by mlstein

A second request for subscriber content for the London Review of books, http://www.lrb.co.uk. Anyone?

Will be included in the next release of calibre

nook.life · 06-23-2010, 08:09 PM

Anyone else notice that the AP recipe has been broken for some time now for the Nook?

It only shows the table of contents with the article summaries, but when you go to the specific article, all you get is a banner ad, a newspaper header and ad images but no article. In other articles you get some crazy coding like

"#lightbox{position:absolute; top:40px; left:0 width:100%; z-index: 100; text-align:center; line height:0;} #lightbox{position:absolute; top:40px; left:0 width:100%; z-index: 100; text-align:center; line height:0;} #lightbox a img {border:none;} #outerImageContainer{ position: relative; background-color: #fff; width: 250px; height 250px; margin:0 auto;} #imageContainer{padding:10px;}" ...etc etc

all this random code is what composes the articles.

In other articles, you get actual text, but it is cut off, only showing half a page. Changing the font size does not matter and it still cuts off text mid sentence?

Anyone know what's going on? All the other recipes that I use ever day are normal...

nook.life · 06-23-2010, 08:41 PM

Quote:

Originally Posted by Starson17

Try this:

Code:

from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import BeautifulSoup

class Explosm(BasicNewsRecipe):
    title               = 'Explosm'
    __author__          = 'Starson17'
    description         = 'Explosm'
    language            = 'en'
    use_embedded_content= False
    no_stylesheets      = True
    linearize_tables      = True
    oldest_article      = 24
    remove_javascript   = True
    remove_empty_feeds    = True
    max_articles_per_feed = 10

    feeds = [
             (u'Explosm Feed', u'http://feeds.feedburner.com/Explosm')
             ]

    def get_article_url(self, article):
        return article.get('link', None)

    keep_only_tags     = [dict(name='div', attrs={'id':'maincontent'})]

    def preprocess_html(self, soup):
        table_tags = soup.findAll('table')
        table_tags[1].extract() 
        NavTag = soup.find(text='&laquo; First') 
        NavTag.parent.parent.extract()
        return soup

    extra_css = '''
                    h1{font-family:Arial,Helvetica,sans-serif; font-weight:bold;font-size:large;}
                    h2{font-family:Arial,Helvetica,sans-serif; font-weight:normal;font-size:small;}
                    p{font-family:Arial,Helvetica,sans-serif;font-size:small;}
                    body{font-family:Helvetica,Arial,sans-serif;font-size:small;}
		'''

Quote:

Originally Posted by Starson17

I took a look at it. I told you I took a look at it. I asked you a question. You didn't respond, so I stopped. I like to know there's really someone out there.

Wow don't I feel stupid. I searched the replies for a response before posting the message, but somehow missed it. Even now I had to do a google search on the forum to find it. Thank you so, so much for looking into this recipe and taking the time to help me out. I really appreciate it. In answer to your question, yes i looked through those first, but it was not offered.

I tried the recipe out and it almost works. Unfortunately, the cartoon gets cut in half. Please see attached pic. Perhaps blending in rford's code above for rotating cartoons would work. I replaced his code with yours starting at def postprocess_html, but the recipe did not work at all (clearly it could not have been that easy, although I figured i'd try)

Thanks again for your help and sorry once again for not fully searching the forum before asking for the request again. THANK YOUUUUU

http://picturepush.com/public/3679162

06-23-2010, 08:09 PM	#2189
nook.life Member Posts: 12 Karma: 10 Join Date: May 2010 Device: Nook	Associated Press Broken Anyone else notice that the AP recipe has been broken for some time now for the Nook? It only shows the table of contents with the article summaries, but when you go to the specific article, all you get is a banner ad, a newspaper header and ad images but no article. In other articles you get some crazy coding like "#lightbox{position:absolute; top:40px; left:0 width:100%; z-index: 100; text-align:center; line height:0;} #lightbox{position:absolute; top:40px; left:0 width:100%; z-index: 100; text-align:center; line height:0;} #lightbox a img {border:none;} #outerImageContainer{ position: relative; background-color: #fff; width: 250px; height 250px; margin:0 auto;} #imageContainer{padding:10px;}" ...etc etc all this random code is what composes the articles. In other articles, you get actual text, but it is cut off, only showing half a page. Changing the font size does not matter and it still cuts off text mid sentence? Anyone know what's going on? All the other recipes that I use ever day are normal...

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Custom column read ?	pchrist7	Calibre	2	10-04-2010 02:52 AM
Archive for custom screensavers	sleeplessdave	Amazon Kindle	1	07-07-2010 12:33 PM
How to back up preferences and custom recipes?	greenapple	Calibre	3	03-29-2010 05:08 AM
Donations for Custom Recipes	ddavtian	Calibre	5	01-23-2010 04:54 PM
Help understanding custom recipes	andersent	Calibre	0	12-17-2009 02:37 PM

06-21-2010, 02:28 PM	#2180
gambarini Connoisseur Posts: 98 Karma: 22 Join Date: Mar 2010 Device: IRiver Story, Ipod Touch, Android SmartPhone	http://www.ilsole24ore.com/rss/primapagina.xml Any ideas with this feed? The correct link is not under "guid", nor "link" or "links" tag.

06-22-2010, 10:19 AM	#2183
mlstein Enthusiast Posts: 49 Karma: 2062 Join Date: May 2010 Device: iPad (one)	A second request for subscriber content for the London Review of books, http://www.lrb.co.uk. Anyone?

Advert

Advert