11-25-2010, 11:43 AM | #1 |
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
New Recipe:Arcamax - Comics
This is another comics recipe. As in gocomics.com and comics.com, you can set the number of days to retrieve and you should customize to set the strips you want or don't want.
The only interesting thing in this recipe is that I wanted to set 100% max/min width on the main comic img, but I didn't want it to apply to the other img tags. I used preprocesss_html to set an id only on the main comic img tag and extra_css to control it. It's ready to add to built-ins. It's a family-friendly site (might make it easier to find/identify family-friendly comics) and has some comics not found in other sites. Code:
#!/usr/bin/env python __license__ = 'GPL v3' __copyright__ = 'Copyright 2010 Starson17' ''' www.arcamax.com ''' from calibre.web.feeds.news import BasicNewsRecipe #from calibre.ebooks.BeautifulSoup import BeautifulSoup import mechanize, re class Arcamax(BasicNewsRecipe): title = 'Arcamax' __author__ = 'Starson17' __version__ = '1.03' __date__ = '25 November 2010' description = u'Family Friendly Comics - Customize for more days/comics: Defaults to 7 days, 25 comics - 20 general, 5 editorial.' category = 'news, comics' language = 'en' use_embedded_content= False no_stylesheets = True remove_javascript = True cover_url = 'http://www.arcamax.com/images/pub/amuse/leftcol/zits.jpg' ####### USER PREFERENCES - SET COMICS AND NUMBER OF COMICS TO RETRIEVE ######## num_comics_to_get = 7 # CHOOSE COMIC STRIPS BELOW - REMOVE COMMENT '# ' FROM IN FRONT OF DESIRED STRIPS conversion_options = {'linearize_tables' : True , 'comment' : description , 'tags' : category , 'language' : language } keep_only_tags = [dict(name='div', attrs={'class':['toon']}), ] def parse_index(self): feeds = [] for title, url in [ ######## COMICS - GENERAL ######## #(u"9 Chickweed Lane", u"http://www.arcamax.com/ninechickweedlane"), #(u"Agnes", u"http://www.arcamax.com/agnes"), #(u"Andy Capp", u"http://www.arcamax.com/andycapp"), (u"BC", u"http://www.arcamax.com/bc"), #(u"Baby Blues", u"http://www.arcamax.com/babyblues"), #(u"Beetle Bailey", u"http://www.arcamax.com/beetlebailey"), (u"Blondie", u"http://www.arcamax.com/blondie"), #u"Boondocks", u"http://www.arcamax.com/boondocks"), #(u"Cathy", u"http://www.arcamax.com/cathy"), #(u"Daddys Home", u"http://www.arcamax.com/daddyshome"), (u"Dilbert", u"http://www.arcamax.com/dilbert"), #(u"Dinette Set", u"http://www.arcamax.com/thedinetteset"), (u"Dog Eat Doug", u"http://www.arcamax.com/dogeatdoug"), (u"Doonesbury", u"http://www.arcamax.com/doonesbury"), #(u"Dustin", u"http://www.arcamax.com/dustin"), (u"Family Circus", u"http://www.arcamax.com/familycircus"), (u"Garfield", u"http://www.arcamax.com/garfield"), #(u"Get Fuzzy", u"http://www.arcamax.com/getfuzzy"), #(u"Girls and Sports", u"http://www.arcamax.com/girlsandsports"), #(u"Hagar the Horrible", u"http://www.arcamax.com/hagarthehorrible"), #(u"Heathcliff", u"http://www.arcamax.com/heathcliff"), #(u"Jerry King Cartoons", u"http://www.arcamax.com/humorcartoon"), #(u"Luann", u"http://www.arcamax.com/luann"), #(u"Momma", u"http://www.arcamax.com/momma"), #(u"Mother Goose and Grimm", u"http://www.arcamax.com/mothergooseandgrimm"), (u"Mutts", u"http://www.arcamax.com/mutts"), #(u"Non Sequitur", u"http://www.arcamax.com/nonsequitur"), #(u"Pearls Before Swine", u"http://www.arcamax.com/pearlsbeforeswine"), #(u"Pickles", u"http://www.arcamax.com/pickles"), #(u"Red and Rover", u"http://www.arcamax.com/redandrover"), #(u"Rubes", u"http://www.arcamax.com/rubes"), #(u"Rugrats", u"http://www.arcamax.com/rugrats"), (u"Speed Bump", u"http://www.arcamax.com/speedbump"), (u"Wizard of Id", u"http://www.arcamax.com/wizardofid"), (u"Dilbert", u"http://www.arcamax.com/dilbert"), (u"Zits", u"http://www.arcamax.com/zits"), ]: articles = self.make_links(url) if articles: feeds.append((title, articles)) return feeds def make_links(self, url): title = 'Temp' current_articles = [] pages = range(1, self.num_comics_to_get+1) for page in pages: page_soup = self.index_to_soup(url) if page_soup: title = page_soup.find(name='div', attrs={'class':'toon'}).p.img['alt'] page_url = url prev_page_url = 'http://www.arcamax.com' + page_soup.find('a', attrs={'class':'next'}, text='Previous').parent['href'] current_articles.append({'title': title, 'url': page_url, 'description':'', 'date':''}) url = prev_page_url current_articles.reverse() return current_articles def preprocess_html(self, soup): main_comic = soup.find('p',attrs={'class':'m0'}) if main_comic.a['target'] == '_blank': main_comic.a.img['id'] = 'main_comic' return soup extra_css = ''' h1{font-family:Arial,Helvetica,sans-serif; font-weight:bold;font-size:large;} h2{font-family:Arial,Helvetica,sans-serif; font-weight:normal;font-size:small;} img#main_comic {max-width:100%; min-width:100%;} p{font-family:Arial,Helvetica,sans-serif;font-size:small;} body{font-family:Helvetica,Arial,sans-serif;font-size:small;} ''' |
11-26-2010, 11:55 PM | #2 |
Enthusiast
Posts: 25
Karma: 10
Join Date: Nov 2010
Device: Samsung Android using FBreader
|
That works great! Thank you so much, your rule!
BJ |
Advert | |
|
04-17-2011, 10:32 PM | #3 |
Member
Posts: 19
Karma: 10
Join Date: Feb 2011
Device: kindle 3
|
I'm working on updating the recipe... seemed to have started failing 4/13/2011.
If I get it, i'll post it... if someone else gets to it before me, please post changes... thanks, -tim |
04-17-2011, 11:52 PM | #4 |
Member
Posts: 19
Karma: 10
Join Date: Feb 2011
Device: kindle 3
|
Need help
no such luck...
here's output; 1% Converting input to HTML... InputFormatPlugin: Recipe Input running 1% Fetching feeds... Python function terminated unexpectedly 'NoneType' object has no attribute 'decode' (Error Code: 1) Traceback (most recent call last): File "site.py", line 103, in main File "site.py", line 85, in run_entry_point File "site-packages\calibre\ebooks\conversion\cli.py", line 282, in main File "site-packages\calibre\ebooks\conversion\plumber.py", line 915, in run File "site-packages\calibre\customize\conversion.py", line 204, in __call__ File "site-packages\calibre\web\feeds\input.py", line 105, in convert File "site-packages\calibre\web\feeds\news.py", line 735, in download File "site-packages\calibre\web\feeds\news.py", line 874, in build_index File "site-packages\calibre\web\feeds\__init__.py", line 338, in feeds_from_index File "site-packages\calibre\web\feeds\__init__.py", line 165, in populate_from_preparsed_feed File "site-packages\calibre\web\feeds\__init__.py", line 30, in __init__ AttributeError: 'NoneType' object has no attribute 'decode'----------------------------------------------- |
04-18-2011, 10:30 AM | #5 |
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Kovid:
The site has changed significantly. Here's a completely rewrittten Arcamax recipe: Spoiler:
|
Advert | |
|
04-18-2011, 10:46 AM | #6 |
creator of calibre
Posts: 43,843
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
updated
|
04-18-2011, 10:53 AM | #7 |
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
|
04-18-2011, 10:54 AM | #8 |
creator of calibre
Posts: 43,843
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
fixed.
|
04-19-2011, 11:38 AM | #9 |
Member
Posts: 19
Karma: 10
Join Date: Feb 2011
Device: kindle 3
|
my dilbert addiction thanks you...
|
04-19-2011, 11:49 AM | #10 |
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
|
04-25-2011, 10:24 AM | #11 |
Enthusiast
Posts: 25
Karma: 10
Join Date: Nov 2010
Device: Samsung Android using FBreader
|
Thank you!
|
04-25-2011, 03:12 PM | #12 |
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
You're welcome.
A bit of a funny: I was getting errors reported on this when it ran overnight. I thought I must have made a mistake when I rewrote it. Each time I went to manually fix/check it, however, it ran correctly. It turned out I had an earlier custom recipe with the same name - Arcamax - that was running overnight. I was trying to "fix" the builtin copy when it was the old custom that was broken. |
05-10-2011, 06:27 PM | #13 |
Member
Posts: 19
Karma: 10
Join Date: Feb 2011
Device: kindle 3
|
Is it just me or did they change the site _again_? looks like it stopped working on 5/5/11...
|
05-11-2011, 07:56 AM | #14 |
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
|
05-11-2011, 09:52 PM | #15 |
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Recipe works when mocked up as Python file, fails when converted to Recipe | ode | Recipes | 7 | 09-04-2011 04:57 AM |
Updated New Yorker recipe doesn't fetch comics | yekim54 | Recipes | 2 | 10-09-2010 10:47 PM |
Comics | cancelx | Astak EZReader | 8 | 05-04-2010 01:22 PM |
Comics? | Drewmangroup | Sony Reader | 14 | 03-03-2009 01:05 PM |