Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 11-25-2010, 12:43 PM   #1
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
New Recipe:Arcamax - Comics

This is another comics recipe. As in gocomics.com and comics.com, you can set the number of days to retrieve and you should customize to set the strips you want or don't want.

The only interesting thing in this recipe is that I wanted to set 100% max/min width on the main comic img, but I didn't want it to apply to the other img tags.

I used preprocesss_html to set an id only on the main comic img tag and extra_css to control it.

It's ready to add to built-ins. It's a family-friendly site (might make it easier to find/identify family-friendly comics) and has some comics not found in other sites.

Code:
#!/usr/bin/env  python

__license__   = 'GPL v3'
__copyright__ = 'Copyright 2010 Starson17'
'''
www.arcamax.com
'''
from calibre.web.feeds.news import BasicNewsRecipe
#from calibre.ebooks.BeautifulSoup import BeautifulSoup
import mechanize, re

class Arcamax(BasicNewsRecipe):
    title               = 'Arcamax'
    __author__          = 'Starson17'
    __version__         = '1.03'
    __date__            = '25 November 2010'
    description         = u'Family Friendly Comics - Customize for more days/comics: Defaults to 7 days, 25 comics - 20 general, 5 editorial.'
    category            = 'news, comics'
    language            = 'en'
    use_embedded_content= False
    no_stylesheets      = True
    remove_javascript   = True
    cover_url           = 'http://www.arcamax.com/images/pub/amuse/leftcol/zits.jpg'

    ####### USER PREFERENCES - SET COMICS AND NUMBER OF COMICS TO RETRIEVE ########
    num_comics_to_get = 7
    # CHOOSE COMIC STRIPS BELOW - REMOVE COMMENT '# ' FROM IN FRONT OF DESIRED STRIPS

    conversion_options = {'linearize_tables'  : True
                        , 'comment'           : description
                        , 'tags'              : category
                        , 'language'          : language
                        }

    keep_only_tags     = [dict(name='div', attrs={'class':['toon']}),
                          ]
   
    def parse_index(self):
        feeds = []
        for title, url in [
                            ######## COMICS - GENERAL ######## 
                            #(u"9 Chickweed Lane", u"http://www.arcamax.com/ninechickweedlane"),
                            #(u"Agnes", u"http://www.arcamax.com/agnes"),
                            #(u"Andy Capp", u"http://www.arcamax.com/andycapp"),
                            (u"BC", u"http://www.arcamax.com/bc"),
                            #(u"Baby Blues", u"http://www.arcamax.com/babyblues"),
                            #(u"Beetle Bailey", u"http://www.arcamax.com/beetlebailey"),
                            (u"Blondie", u"http://www.arcamax.com/blondie"),
                            #u"Boondocks", u"http://www.arcamax.com/boondocks"),
                            #(u"Cathy", u"http://www.arcamax.com/cathy"),
                            #(u"Daddys Home", u"http://www.arcamax.com/daddyshome"),
                            (u"Dilbert", u"http://www.arcamax.com/dilbert"),
                            #(u"Dinette Set", u"http://www.arcamax.com/thedinetteset"),
                            (u"Dog Eat Doug", u"http://www.arcamax.com/dogeatdoug"),
                            (u"Doonesbury", u"http://www.arcamax.com/doonesbury"),
                            #(u"Dustin", u"http://www.arcamax.com/dustin"),
                            (u"Family Circus", u"http://www.arcamax.com/familycircus"),
                            (u"Garfield", u"http://www.arcamax.com/garfield"),
                            #(u"Get Fuzzy", u"http://www.arcamax.com/getfuzzy"),
                            #(u"Girls and Sports", u"http://www.arcamax.com/girlsandsports"),
                            #(u"Hagar the Horrible", u"http://www.arcamax.com/hagarthehorrible"),
                            #(u"Heathcliff", u"http://www.arcamax.com/heathcliff"),
                            #(u"Jerry King Cartoons", u"http://www.arcamax.com/humorcartoon"),
                            #(u"Luann", u"http://www.arcamax.com/luann"),
                            #(u"Momma", u"http://www.arcamax.com/momma"),
                            #(u"Mother Goose and Grimm", u"http://www.arcamax.com/mothergooseandgrimm"),
                            (u"Mutts", u"http://www.arcamax.com/mutts"),
                            #(u"Non Sequitur", u"http://www.arcamax.com/nonsequitur"),
                            #(u"Pearls Before Swine", u"http://www.arcamax.com/pearlsbeforeswine"),
                            #(u"Pickles", u"http://www.arcamax.com/pickles"),
                            #(u"Red and Rover", u"http://www.arcamax.com/redandrover"),
                            #(u"Rubes", u"http://www.arcamax.com/rubes"),
                            #(u"Rugrats", u"http://www.arcamax.com/rugrats"),
                            (u"Speed Bump", u"http://www.arcamax.com/speedbump"),
                            (u"Wizard of Id", u"http://www.arcamax.com/wizardofid"),
                            (u"Dilbert", u"http://www.arcamax.com/dilbert"),
                            (u"Zits", u"http://www.arcamax.com/zits"),
                             ]:
            articles = self.make_links(url)
            if articles:
                feeds.append((title, articles))
        return feeds

    def make_links(self, url):
        title = 'Temp'
        current_articles = []
        pages = range(1, self.num_comics_to_get+1)
        for page in pages:
            page_soup = self.index_to_soup(url)
            if page_soup:
                title = page_soup.find(name='div', attrs={'class':'toon'}).p.img['alt']
                page_url = url
                prev_page_url = 'http://www.arcamax.com' + page_soup.find('a', attrs={'class':'next'}, text='Previous').parent['href']
            current_articles.append({'title': title, 'url': page_url, 'description':'', 'date':''})
            url = prev_page_url
        current_articles.reverse()
        return current_articles

    def preprocess_html(self, soup):
        main_comic = soup.find('p',attrs={'class':'m0'})
        if main_comic.a['target'] == '_blank':
            main_comic.a.img['id'] = 'main_comic'
        return soup

    extra_css = '''
                    h1{font-family:Arial,Helvetica,sans-serif; font-weight:bold;font-size:large;}
                    h2{font-family:Arial,Helvetica,sans-serif; font-weight:normal;font-size:small;}
                    img#main_comic {max-width:100%; min-width:100%;}
                    p{font-family:Arial,Helvetica,sans-serif;font-size:small;}
                    body{font-family:Helvetica,Arial,sans-serif;font-size:small;}
		'''
Starson17 is offline   Reply With Quote
Advert
Old 11-27-2010, 12:55 AM   #2
bjc
Member
bjc began at the beginning.
 
Posts: 22
Karma: 10
Join Date: Nov 2010
Device: Samsung Android using FBreader
That works great! Thank you so much, your rule!

BJ
bjc is offline   Reply With Quote
Old 04-17-2011, 11:32 PM   #3
AustinTim
Member
AustinTim began at the beginning.
 
Posts: 19
Karma: 10
Join Date: Feb 2011
Device: kindle 3
I'm working on updating the recipe... seemed to have started failing 4/13/2011.

If I get it, i'll post it... if someone else gets to it before me, please post changes...

thanks,
-tim
AustinTim is offline   Reply With Quote
Old 04-18-2011, 12:52 AM   #4
AustinTim
Member
AustinTim began at the beginning.
 
Posts: 19
Karma: 10
Join Date: Feb 2011
Device: kindle 3
Need help

no such luck...

here's output;
1% Converting input to HTML...
InputFormatPlugin: Recipe Input running
1% Fetching feeds...
Python function terminated unexpectedly
'NoneType' object has no attribute 'decode' (Error Code: 1)
Traceback (most recent call last):
File "site.py", line 103, in main
File "site.py", line 85, in run_entry_point
File "site-packages\calibre\ebooks\conversion\cli.py", line 282, in main
File "site-packages\calibre\ebooks\conversion\plumber.py", line 915, in run
File "site-packages\calibre\customize\conversion.py", line 204, in __call__
File "site-packages\calibre\web\feeds\input.py", line 105, in convert
File "site-packages\calibre\web\feeds\news.py", line 735, in download
File "site-packages\calibre\web\feeds\news.py", line 874, in build_index
File "site-packages\calibre\web\feeds\__init__.py", line 338, in feeds_from_index
File "site-packages\calibre\web\feeds\__init__.py", line 165, in populate_from_preparsed_feed
File "site-packages\calibre\web\feeds\__init__.py", line 30, in __init__
AttributeError: 'NoneType' object has no attribute 'decode'
-----------------------------------------------
Attached Files
File Type: txt ComicsArcamax.txt (6.6 KB, 180 views)
AustinTim is offline   Reply With Quote
Old 04-18-2011, 11:30 AM   #5
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Kovid:

The site has changed significantly. Here's a completely rewrittten Arcamax recipe:


Spoiler:
Code:
#!/usr/bin/env  python

__license__   = 'GPL v3'
__copyright__ = 'Copyright 2010 Starson17'
'''
www.arcamax.com
'''
from calibre.web.feeds.news import BasicNewsRecipe
import mechanize, re
from calibre.ebooks.BeautifulSoup import Tag

class Arcamax(BasicNewsRecipe):
    title               = 'Arcamax'
    __author__          = 'Starson17'
    __version__         = '1.04'
    __date__            = '18 April 2011'
    description         = u'Family Friendly Comics - Customize for more days/comics: Defaults to 7 days, 25 comics - 20 general, 5 editorial.'
    category            = 'news, comics'
    language            = 'en'
    use_embedded_content= False
    no_stylesheets      = True
    remove_javascript   = True
    cover_url           = 'http://www.arcamax.com/images/pub/amuse/leftcol/zits.jpg'

    ####### USER PREFERENCES - SET COMICS AND NUMBER OF COMICS TO RETRIEVE ########
    num_comics_to_get = 7
    # CHOOSE COMIC STRIPS BELOW - REMOVE COMMENT '# ' FROM IN FRONT OF DESIRED STRIPS

    conversion_options = {'linearize_tables'  : True
                        , 'comment'           : description
                        , 'tags'              : category
                        , 'language'          : language
                        }

    keep_only_tags     = [dict(name='div', attrs={'class':['comics-header']}),
                                        dict(name='b', attrs={'class':['current']}),
                                        dict(name='article', attrs={'class':['comic']}),
                                        ]

    remove_tags = [dict(name='div', attrs={'id':['comicfull' ]}),
                               dict(name='div', attrs={'class':['calendar' ]}), 
                               dict(name='nav', attrs={'class':['calendar-nav' ]}),  
                               ]
   
    def parse_index(self):
        feeds = []
        for title, url in [
                            ######## COMICS - GENERAL ########
                            #(u"9 Chickweed Lane", u"http://www.arcamax.com/ninechickweedlane"),
                            #(u"Agnes", u"http://www.arcamax.com/agnes"),
                            #(u"Andy Capp", u"http://www.arcamax.com/andycapp"),
                            (u"BC", u"http://www.arcamax.com/bc"),
                            #(u"Baby Blues", u"http://www.arcamax.com/babyblues"),
                            #(u"Beetle Bailey", u"http://www.arcamax.com/beetlebailey"),
                            (u"Blondie", u"http://www.arcamax.com/blondie"),
                            #u"Boondocks", u"http://www.arcamax.com/boondocks"),
                            #(u"Cathy", u"http://www.arcamax.com/cathy"),
                            #(u"Daddys Home", u"http://www.arcamax.com/daddyshome"),
                            (u"Dilbert", u"http://www.arcamax.com/dilbert"),
                            #(u"Dinette Set", u"http://www.arcamax.com/thedinetteset"),
                            (u"Dog Eat Doug", u"http://www.arcamax.com/dogeatdoug"),
                            (u"Doonesbury", u"http://www.arcamax.com/doonesbury"),
                            #(u"Dustin", u"http://www.arcamax.com/dustin"),
                            (u"Family Circus", u"http://www.arcamax.com/familycircus"),
                            (u"Garfield", u"http://www.arcamax.com/garfield"),
                            #(u"Get Fuzzy", u"http://www.arcamax.com/getfuzzy"),
                            #(u"Girls and Sports", u"http://www.arcamax.com/girlsandsports"),
                            #(u"Hagar the Horrible", u"http://www.arcamax.com/hagarthehorrible"),
                            #(u"Heathcliff", u"http://www.arcamax.com/heathcliff"),
                            #(u"Jerry King Cartoons", u"http://www.arcamax.com/humorcartoon"),
                            #(u"Luann", u"http://www.arcamax.com/luann"),
                            #(u"Momma", u"http://www.arcamax.com/momma"),
                            #(u"Mother Goose and Grimm", u"http://www.arcamax.com/mothergooseandgrimm"),
                            (u"Mutts", u"http://www.arcamax.com/mutts"),
                            #(u"Non Sequitur", u"http://www.arcamax.com/nonsequitur"),
                            #(u"Pearls Before Swine", u"http://www.arcamax.com/pearlsbeforeswine"),
                            #(u"Pickles", u"http://www.arcamax.com/pickles"),
                            #(u"Red and Rover", u"http://www.arcamax.com/redandrover"),
                            #(u"Rubes", u"http://www.arcamax.com/rubes"),
                            #(u"Rugrats", u"http://www.arcamax.com/rugrats"),
                            (u"Speed Bump", u"http://www.arcamax.com/speedbump"),
                            (u"Wizard of Id", u"http://www.arcamax.com/wizardofid"),
                            (u"Zits", u"http://www.arcamax.com/zits"),
                             ]:
            articles = self.make_links(url)
            if articles:
                feeds.append((title, articles))
        return feeds

    def make_links(self, url):
        title = 'Temp'
        current_articles = []
        pages = range(1, self.num_comics_to_get+1)
        for page in pages:
            page_soup = self.index_to_soup(url)
            if page_soup:
                title = page_soup.find(name='div', attrs={'class':'comics-header'}).h1.contents[0]
                print 'title is: ', title
                page_url = url
                print 'url is: ', url
                # orig prev_page_url = 'http://www.arcamax.com' + page_soup.find('a', attrs={'class':'prev'}, text='Previous').parent['href']
                prev_page_url = 'http://www.arcamax.com' + page_soup.find('span', text='Previous').parent.parent['href']
                print 'prev_page_url is: ', prev_page_url
                date = self.tag_to_string(page_soup.find(name='b', attrs={'class':['current']}))
                print 'date is: ', date
            current_articles.append({'title': title, 'url': page_url, 'description':'', 'date': date})
            url = prev_page_url
        current_articles.reverse()
        return current_articles

    def preprocess_html(self, soup):
        for img_tag in soup.findAll('img'):
            parent_tag = img_tag.parent
            if parent_tag.name == 'a':
                new_tag = Tag(soup,'p')
                new_tag.insert(0,img_tag)
                parent_tag.replaceWith(new_tag)
            elif parent_tag.name == 'p':
                if not self.tag_to_string(parent_tag) == '':
                    new_div = Tag(soup,'div')
                    new_tag = Tag(soup,'p')
                    new_tag.insert(0,img_tag)
                    parent_tag.replaceWith(new_div)
                    new_div.insert(0,new_tag)
                    new_div.insert(1,parent_tag)
        return soup

    extra_css = '''
                    h1{font-family:Arial,Helvetica,sans-serif; font-weight:bold;font-size:large;}
                    h2{font-family:Arial,Helvetica,sans-serif; font-weight:normal;font-size:small;}
                    img {max-width:100%; min-width:100%;}
                    p{font-family:Arial,Helvetica,sans-serif;font-size:small;}
                    body{font-family:Helvetica,Arial,sans-serif;font-size:small;}
		'''
Starson17 is offline   Reply With Quote
Advert
Old 04-18-2011, 11:46 AM   #6
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 31,728
Karma: 8696042
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
updated
kovidgoyal is online now   Reply With Quote
Old 04-18-2011, 11:53 AM   #7
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by kovidgoyal View Post
updated
As usual, I forgot to take out the print statements.
Starson17 is offline   Reply With Quote
Old 04-18-2011, 11:54 AM   #8
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 31,728
Karma: 8696042
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
fixed.
kovidgoyal is online now   Reply With Quote
Old 04-19-2011, 12:38 PM   #9
AustinTim
Member
AustinTim began at the beginning.
 
Posts: 19
Karma: 10
Join Date: Feb 2011
Device: kindle 3
my dilbert addiction thanks you...
AustinTim is offline   Reply With Quote
Old 04-19-2011, 12:49 PM   #10
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by AustinTim View Post
my dilbert addiction thanks you...
IIRC, Dilbert shows up in ArcaMax, a dedicated Dilbert, comics.com and in iDNES.cz Zprávy, Technet, Komiksy a další. I'm a fan, too, and keep a special rotated image version for reading on my device.
Starson17 is offline   Reply With Quote
Old 04-25-2011, 11:24 AM   #11
bjc
Member
bjc began at the beginning.
 
Posts: 22
Karma: 10
Join Date: Nov 2010
Device: Samsung Android using FBreader
Thank you!
bjc is offline   Reply With Quote
Old 04-25-2011, 04:12 PM   #12
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by bjc View Post
Thank you!
You're welcome.

A bit of a funny: I was getting errors reported on this when it ran overnight. I thought I must have made a mistake when I rewrote it. Each time I went to manually fix/check it, however, it ran correctly.

It turned out I had an earlier custom recipe with the same name - Arcamax - that was running overnight. I was trying to "fix" the builtin copy when it was the old custom that was broken.
Starson17 is offline   Reply With Quote
Old 05-10-2011, 07:27 PM   #13
AustinTim
Member
AustinTim began at the beginning.
 
Posts: 19
Karma: 10
Join Date: Feb 2011
Device: kindle 3
Is it just me or did they change the site _again_? looks like it stopped working on 5/5/11...
AustinTim is offline   Reply With Quote
Old 05-11-2011, 08:56 AM   #14
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by AustinTim View Post
Is it just me or did they change the site _again_? looks like it stopped working on 5/5/11...
Yes, it seems to be broken again.
Starson17 is offline   Reply With Quote
Old 05-11-2011, 10:52 PM   #15
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by Starson17 View Post
Yes, it seems to be broken again.
This should fix it:
Spoiler:
Code:
#!/usr/bin/env  python

__license__   = 'GPL v3'
__copyright__ = 'Copyright 2010 Starson17'
'''
www.arcamax.com
'''
from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import Tag

class Arcamax(BasicNewsRecipe):
    title               = 'Arcamax'
    __author__          = 'Starson17'
    __version__         = '1.04'
    __date__            = '18 April 2011'
    description         = u'Family Friendly Comics - Customize for more days/comics: Defaults to 7 days, 25 comics - 20 general, 5 editorial.'
    category            = 'news, comics'
    language            = 'en'
    use_embedded_content= False
    no_stylesheets      = True
    remove_javascript   = True
    cover_url           = 'http://www.arcamax.com/images/pub/amuse/leftcol/zits.jpg'

    ####### USER PREFERENCES - SET COMICS AND NUMBER OF COMICS TO RETRIEVE ########
    num_comics_to_get = 7
    # CHOOSE COMIC STRIPS BELOW - REMOVE COMMENT '# ' FROM IN FRONT OF DESIRED STRIPS

    conversion_options = {'linearize_tables'  : True
                        , 'comment'           : description
                        , 'tags'              : category
                        , 'language'          : language
                        }

    keep_only_tags     = [dict(name='div', attrs={'class':['comics-header']}),
                                        dict(name='b', attrs={'class':['current']}),
                                        dict(name='article', attrs={'class':['comic']}),
                                        ]

    remove_tags = [dict(name='div', attrs={'id':['comicfull' ]}),
                               dict(name='div', attrs={'class':['calendar' ]}),
                               dict(name='nav', attrs={'class':['calendar-nav' ]}),
                               ]

    def parse_index(self):
        feeds = []
        for title, url in [
                            ######## COMICS - GENERAL ########
                            #(u"9 Chickweed Lane", u"http://www.arcamax.com/ninechickweedlane"),
                            #(u"Agnes", u"http://www.arcamax.com/agnes"),
                            #(u"Andy Capp", u"http://www.arcamax.com/andycapp"),
                            (u"BC", u"http://www.arcamax.com/bc"),
                            #(u"Baby Blues", u"http://www.arcamax.com/babyblues"),
                            #(u"Beetle Bailey", u"http://www.arcamax.com/beetlebailey"),
                            (u"Blondie", u"http://www.arcamax.com/blondie"),
                            #u"Boondocks", u"http://www.arcamax.com/boondocks"),
                            #(u"Cathy", u"http://www.arcamax.com/cathy"),
                            #(u"Daddys Home", u"http://www.arcamax.com/daddyshome"),
                            (u"Dilbert", u"http://www.arcamax.com/dilbert"),
                            #(u"Dinette Set", u"http://www.arcamax.com/thedinetteset"),
                            (u"Dog Eat Doug", u"http://www.arcamax.com/dogeatdoug"),
                            (u"Doonesbury", u"http://www.arcamax.com/doonesbury"),
                            #(u"Dustin", u"http://www.arcamax.com/dustin"),
                            (u"Family Circus", u"http://www.arcamax.com/familycircus"),
                            (u"Garfield", u"http://www.arcamax.com/garfield"),
                            #(u"Get Fuzzy", u"http://www.arcamax.com/getfuzzy"),
                            #(u"Girls and Sports", u"http://www.arcamax.com/girlsandsports"),
                            #(u"Hagar the Horrible", u"http://www.arcamax.com/hagarthehorrible"),
                            #(u"Heathcliff", u"http://www.arcamax.com/heathcliff"),
                            #(u"Jerry King Cartoons", u"http://www.arcamax.com/humorcartoon"),
                            #(u"Luann", u"http://www.arcamax.com/luann"),
                            #(u"Momma", u"http://www.arcamax.com/momma"),
                            #(u"Mother Goose and Grimm", u"http://www.arcamax.com/mothergooseandgrimm"),
                            (u"Mutts", u"http://www.arcamax.com/mutts"),
                            #(u"Non Sequitur", u"http://www.arcamax.com/nonsequitur"),
                            #(u"Pearls Before Swine", u"http://www.arcamax.com/pearlsbeforeswine"),
                            #(u"Pickles", u"http://www.arcamax.com/pickles"),
                            #(u"Red and Rover", u"http://www.arcamax.com/redandrover"),
                            #(u"Rubes", u"http://www.arcamax.com/rubes"),
                            #(u"Rugrats", u"http://www.arcamax.com/rugrats"),
                            (u"Speed Bump", u"http://www.arcamax.com/speedbump"),
                            (u"Wizard of Id", u"http://www.arcamax.com/wizardofid"),
                            (u"Zits", u"http://www.arcamax.com/zits"),
                             ]:
            articles = self.make_links(url)
            if articles:
                feeds.append((title, articles))
        return feeds

    def make_links(self, url):
        title = 'Temp'
        current_articles = []
        pages = range(1, self.num_comics_to_get+1)
        for page in pages:
            page_soup = self.index_to_soup(url)
            if page_soup:
                title = self.tag_to_string(page_soup.find(name='div', attrs={'class':'comics-header'}).h1.contents[0])
                page_url = url
                # orig prev_page_url = 'http://www.arcamax.com' + page_soup.find('a', attrs={'class':'prev'}, text='Previous').parent['href']
                prev_page_url = 'http://www.arcamax.com' + page_soup.find('span', text='Previous').parent.parent['href']
                date = self.tag_to_string(page_soup.find(name='b', attrs={'class':['current']}))
            current_articles.append({'title': title, 'url': page_url, 'description':'', 'date': date})
            url = prev_page_url
        current_articles.reverse()
        return current_articles

    def preprocess_html(self, soup):
        for img_tag in soup.findAll('img'):
            parent_tag = img_tag.parent
            if parent_tag.name == 'a':
                new_tag = Tag(soup,'p')
                new_tag.insert(0,img_tag)
                parent_tag.replaceWith(new_tag)
            elif parent_tag.name == 'p':
                if not self.tag_to_string(parent_tag) == '':
                    new_div = Tag(soup,'div')
                    new_tag = Tag(soup,'p')
                    new_tag.insert(0,img_tag)
                    parent_tag.replaceWith(new_div)
                    new_div.insert(0,new_tag)
                    new_div.insert(1,parent_tag)
        return soup

    extra_css = '''
                    h1{font-family:Arial,Helvetica,sans-serif; font-weight:bold;font-size:large;}
                    h2{font-family:Arial,Helvetica,sans-serif; font-weight:normal;font-size:small;}
                    img {max-width:100%; min-width:100%;}
                    p{font-family:Arial,Helvetica,sans-serif;font-size:small;}
                    body{font-family:Helvetica,Arial,sans-serif;font-size:small;}
		'''
Starson17 is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Recipe works when mocked up as Python file, fails when converted to Recipe ode Recipes 7 09-04-2011 05:57 AM
Updated New Yorker recipe doesn't fetch comics yekim54 Recipes 2 10-09-2010 11:47 PM
Comics cancelx Astak EZReader 8 05-04-2010 02:22 PM
Comics? Drewmangroup Sony Reader 14 03-03-2009 02:05 PM


All times are GMT -4. The time now is 09:55 AM.


MobileRead.com is a privately owned, operated and funded community.