Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 11-28-2011, 09:03 PM   #1
ppriede
Junior Member
ppriede began at the beginning.
 
Posts: 6
Karma: 10
Join Date: May 2010
Location: Chile
Device: Kindle 3G, Kindle Fire (soon)
Random GoComics.com?

Hi,
first thing first.
I use Calibre everyday, so, thank you very much to the developer.

Well, now a simple question..

How can i create a random GoComics.com Recipe?

I try to modify and the existing Recipe, but i don't really know that much of Python and maybe that's the only problem.


Here is the Recipe
Code:
#!/usr/bin/env  python

__license__   = 'GPL v3'
__copyright__ = 'Copyright 2010 Starson17'
'''
www.gocomics.com
'''
from calibre.web.feeds.news import BasicNewsRecipe
import mechanize, re, random

class GoComics(BasicNewsRecipe):
    title               = 'GoComics L v2+azar'
    __author__          = 'Starson17'
    __version__         = '1.06'
    __date__            = '07 June 2011'
    description         = u'200+ Comics - Customize for more days/comics: Defaults to 7 days, 25 comics - 20 general, 5 editorial.'
    category            = 'news, comics'
    language            = 'en'
    use_embedded_content= False
    no_stylesheets      = True
    remove_javascript   = True
    cover_url           = 'http://paulbuckley14059.files.wordpress.com/2008/06/calvin-and-hobbes.jpg'
    remove_attributes = ['style']

    ####### USER PREFERENCES - COMICS, IMAGE SIZE AND NUMBER OF COMICS TO RETRIEVE ########
    # num_comics_to_get - I've tried up to 99 on Calvin&Hobbes
    num_comics_to_get = 1
    # comic_size 300 is small, 600 is medium, 900 is large, 1500 is extra-large
    comic_size = 900
    # CHOOSE COMIC STRIPS BELOW - REMOVE COMMENT '# ' FROM IN FRONT OF DESIRED STRIPS
    # Please do not overload their servers by selecting all comics and 1000 strips from each!

    conversion_options = {'linearize_tables'  : True
                        , 'comment'           : description
                        , 'tags'              : category
                        , 'language'          : language
                        }

    keep_only_tags     = [dict(name='div', attrs={'class':['feature','banner']}),
                          ]

    remove_tags = [dict(name='a', attrs={'class':['beginning','prev','cal','next','newest']}),
                   dict(name='div', attrs={'class':['tag-wrapper']}),
                   dict(name='a', attrs={'href':re.compile(r'.*mutable_[0-9]+', re.IGNORECASE)}),
                   dict(name='img', attrs={'src':re.compile(r'.*mutable_[0-9]+', re.IGNORECASE)}),
                   dict(name='ul', attrs={'class':['share-nav','feature-nav']}),
                   ]

    def get_browser(self):
        br = BasicNewsRecipe.get_browser(self)
        cookies = mechanize.CookieJar()
        br = mechanize.build_opener(mechanize.HTTPCookieProcessor(cookies))
        br.addheaders = [('Referer','http://www.gocomics.com/')]
        return br

    def parse_index(self):
        feedis =  [
                       #(u"2 Cows and a Chicken", u"http://www.gocomics.com/2cowsandachicken"),
                       (u"9 Chickweed Lane", u"http://www.gocomics.com/9chickweedlane"),
                       (u"9 to 5", u"http://www.gocomics.com/9to5"),
                       (u"Adam At Home", u"http://www.gocomics.com/adamathome"),
                       (u"Agnes", u"http://www.gocomics.com/agnes"),
                       (u"Alley Oop", u"http://www.gocomics.com/alleyoop"),
                       (u"Andy Capp", u"http://www.gocomics.com/andycapp"),
                       (u"Animal Crackers", u"http://www.gocomics.com/animalcrackers"),
                       (u"Annie", u"http://www.gocomics.com/annie"),
                       (u"Arlo & Janis", u"http://www.gocomics.com/arloandjanis"),
                       (u"Ask Shagg", u"http://www.gocomics.com/askshagg"),

                       (u"Back in the Day", u"http://www.gocomics.com/backintheday"),
                       (u"Bad Reporter", u"http://www.gocomics.com/badreporter"),
                       (u"Baldo", u"http://www.gocomics.com/baldo"),
                       (u"Ballard Street", u"http://www.gocomics.com/ballardstreet"),
                       (u"Barkeater Lake", u"http://www.gocomics.com/barkeaterlake"),
                       (u"Basic Instructions", u"http://www.gocomics.com/basicinstructions"),
                       (u"Ben", u"http://www.gocomics.com/ben"),
                       (u"Betty", u"http://www.gocomics.com/betty"),
                       (u"Bewley", u"http://www.gocomics.com/bewley"),
                       (u"Big Nate", u"http://www.gocomics.com/bignate"),
                       (u"Big Top", u"http://www.gocomics.com/bigtop"),
                       (u"Biographic", u"http://www.gocomics.com/biographic"),
                       (u"Birdbrains", u"http://www.gocomics.com/birdbrains"),
                       (u"Bleeker: The Rechargeable Dog", u"http://www.gocomics.com/bleeker"),
                       (u"Bliss", u"http://www.gocomics.com/bliss"),
                       (u"Bloom County", u"http://www.gocomics.com/bloomcounty"),
                       (u"Bo Nanas", u"http://www.gocomics.com/bonanas"),
                       (u"Bob the Squirrel", u"http://www.gocomics.com/bobthesquirrel"),
                       (u"Boomerangs", u"http://www.gocomics.com/boomerangs"),
                       (u"Bottomliners", u"http://www.gocomics.com/bottomliners"),
                       (u"Bound and Gagged", u"http://www.gocomics.com/boundandgagged"),
                       (u"Brainwaves", u"http://www.gocomics.com/brainwaves"),
                       (u"Brenda Starr", u"http://www.gocomics.com/brendastarr"),
                       (u"Brevity", u"http://www.gocomics.com/brevity"),
                       (u"Brewster Rockit", u"http://www.gocomics.com/brewsterrockit"),
                       (u"Broom Hilda", u"http://www.gocomics.com/broomhilda"),

                       (u"Candorville", u"http://www.gocomics.com/candorville"),
                       (u"Cathy", u"http://www.gocomics.com/cathy"),
                       (u"C'est la Vie", u"http://www.gocomics.com/cestlavie"),
                       (u"Cheap Thrills", u"http://www.gocomics.com/cheapthrills"),
                       (u"Chuckle Bros", u"http://www.gocomics.com/chucklebros"),
                       (u"Citizen Dog", u"http://www.gocomics.com/citizendog"),
                       (u"Cleats", u"http://www.gocomics.com/cleats"),
                       (u"Close to Home", u"http://www.gocomics.com/closetohome"),
                       (u"Committed", u"http://www.gocomics.com/committed"),
                       (u"Compu-toon", u"http://www.gocomics.com/compu-toon"),
                       (u"Cornered", u"http://www.gocomics.com/cornered"),
                       (u"Cow & Boy", u"http://www.gocomics.com/cow&boy"),
                       (u"Cul de Sac", u"http://www.gocomics.com/culdesac"),
                       (u"Daddy's Home", u"http://www.gocomics.com/daddyshome"),
                       (u"Deep Cover", u"http://www.gocomics.com/deepcover"),
                       (u"Dick Tracy", u"http://www.gocomics.com/dicktracy"),
                       (u"Dog Eat Doug", u"http://www.gocomics.com/dogeatdoug"),
                       (u"Domestic Abuse", u"http://www.gocomics.com/domesticabuse"),

                       (u"Doonesbury", u"http://www.gocomics.com/doonesbury"),
                       (u"Drabble", u"http://www.gocomics.com/drabble"),
                       (u"Eek!", u"http://www.gocomics.com/eek"),
                       (u"F Minus", u"http://www.gocomics.com/fminus"),
                       (u"Family Tree", u"http://www.gocomics.com/familytree"),
                       (u"Farcus", u"http://www.gocomics.com/farcus"),
                       (u"Fat Cats Classics", u"http://www.gocomics.com/fatcatsclassics"),
                       (u"Ferd'nand", u"http://www.gocomics.com/ferdnand"),
                       (u"Flight Deck", u"http://www.gocomics.com/flightdeck"),
                       (u"Flo and Friends", u"http://www.gocomics.com/floandfriends"),
                       (u"For Better or For Worse", u"http://www.gocomics.com/forbetterorforworse"),
                       (u"For Heaven's Sake", u"http://www.gocomics.com/forheavenssake"),
                       (u"Fort Knox", u"http://www.gocomics.com/fortknox"),
                       (u"FoxTrot Classics", u"http://www.gocomics.com/foxtrotclassics"),
                       (u"FoxTrot", u"http://www.gocomics.com/foxtrot"),
                       (u"Frank & Ernest", u"http://www.gocomics.com/frankandernest"),
                       (u"Frazz", u"http://www.gocomics.com/frazz"),
                       (u"Fred Basset", u"http://www.gocomics.com/fredbasset"),
                       (u"Free Range", u"http://www.gocomics.com/freerange"),
                       (u"Frog Applause", u"http://www.gocomics.com/frogapplause"),


                       (u"Gasoline Alley", u"http://www.gocomics.com/gasolinealley"),
                       (u"Geech Classics", u"http://www.gocomics.com/geechclassics"),

                       (u"Gil Thorp", u"http://www.gocomics.com/gilthorp"),
                       (u"Ginger Meggs", u"http://www.gocomics.com/gingermeggs"),
                       (u"Girls & Sports", u"http://www.gocomics.com/girlsandsports"),
                       (u"Graffiti", u"http://www.gocomics.com/graffiti"),
                       (u"Grand Avenue", u"http://www.gocomics.com/grandavenue"),
                       (u"Haiku Ewe", u"http://www.gocomics.com/haikuewe"),
                       (u"Heart of the City", u"http://www.gocomics.com/heartofthecity"),
                       (u"Heathcliff", u"http://www.gocomics.com/heathcliff"),
                       (u"Herb and Jamaal", u"http://www.gocomics.com/herbandjamaal"),
                       (u"Herman", u"http://www.gocomics.com/herman"),
                       (u"Home and Away", u"http://www.gocomics.com/homeandaway"),
                       (u"Housebroken", u"http://www.gocomics.com/housebroken"),
                       (u"Hubert and Abby", u"http://www.gocomics.com/hubertandabby"),
                       (u"Imagine This", u"http://www.gocomics.com/imaginethis"),
                       (u"In the Bleachers", u"http://www.gocomics.com/inthebleachers"),
                       (u"In the Sticks", u"http://www.gocomics.com/inthesticks"),
                       (u"Ink Pen", u"http://www.gocomics.com/inkpen"),
                       (u"It's All About You", u"http://www.gocomics.com/itsallaboutyou"),
                       (u"Jane's World", u"http://www.gocomics.com/janesworld"),
                       (u"Joe Vanilla", u"http://www.gocomics.com/joevanilla"),
                       (u"Jump Start", u"http://www.gocomics.com/jumpstart"),
                       (u"Kit 'N' Carlyle", u"http://www.gocomics.com/kitandcarlyle"),
                       (u"La Cucaracha", u"http://www.gocomics.com/lacucaracha"),
                       (u"Last Kiss", u"http://www.gocomics.com/lastkiss"),
                       (u"Legend of Bill", u"http://www.gocomics.com/legendofbill"),
                       (u"Liberty Meadows", u"http://www.gocomics.com/libertymeadows"),
                       (u"Li'l Abner Classics", u"http://www.gocomics.com/lilabnerclassics"),
                       (u"Lio", u"http://www.gocomics.com/lio"),
                       (u"Little Dog Lost", u"http://www.gocomics.com/littledoglost"),
                       (u"Little Otto", u"http://www.gocomics.com/littleotto"),
                       (u"Lola", u"http://www.gocomics.com/lola"),
                       (u"Loose Parts", u"http://www.gocomics.com/looseparts"),
                       (u"Love Is...", u"http://www.gocomics.com/loveis"),
                       (u"Luann", u"http://www.gocomics.com/luann"),
                       (u"Maintaining", u"http://www.gocomics.com/maintaining"),
                       (u"Marmaduke", u"http://www.gocomics.com/marmaduke"),
                       (u"Meg! Classics", u"http://www.gocomics.com/megclassics"),
                       (u"Middle-Aged White Guy", u"http://www.gocomics.com/middleagedwhiteguy"),
                       (u"Minimum Security", u"http://www.gocomics.com/minimumsecurity"),
                       (u"Moderately Confused", u"http://www.gocomics.com/moderatelyconfused"),
                       (u"Momma", u"http://www.gocomics.com/momma"),
                       (u"Monty", u"http://www.gocomics.com/monty"),
                       (u"Motley Classics", u"http://www.gocomics.com/motleyclassics"),
                       (u"Mutt & Jeff", u"http://www.gocomics.com/muttandjeff"),
                       (u"Mythtickle", u"http://www.gocomics.com/mythtickle"),
                       (u"Nancy", u"http://www.gocomics.com/nancy"),
                       (u"Natural Selection", u"http://www.gocomics.com/naturalselection"),
                       (u"Nest Heads", u"http://www.gocomics.com/nestheads"),
                       (u"NEUROTICA", u"http://www.gocomics.com/neurotica"),
                       (u"New Adventures of Queen Victoria", u"http://www.gocomics.com/thenewadventuresofqueenvictoria"),
                       (u"Non Sequitur", u"http://www.gocomics.com/nonsequitur"),
                       (u"Off The Mark", u"http://www.gocomics.com/offthemark"),
                       (u"On A Claire Day", u"http://www.gocomics.com/onaclaireday"),
                       (u"One Big Happy Classics", u"http://www.gocomics.com/onebighappyclassics"),
                       (u"One Big Happy", u"http://www.gocomics.com/onebighappy"),
                       (u"Out of the Gene Pool Re-Runs", u"http://www.gocomics.com/outofthegenepool"),
                       (u"Over the Hedge", u"http://www.gocomics.com/overthehedge"),
                       (u"Overboard", u"http://www.gocomics.com/overboard"),
                       (u"PC and Pixel", u"http://www.gocomics.com/pcandpixel"),
                       (u"Peanuts", u"http://www.gocomics.com/peanuts"),
                       (u"Pearls Before Swine", u"http://www.gocomics.com/pearlsbeforeswine"),
                       (u"Pibgorn Sketches", u"http://www.gocomics.com/pibgornsketches"),

                       (u"Pickles", u"http://www.gocomics.com/pickles"),
                       (u"Pinkerton", u"http://www.gocomics.com/pinkerton"),
                       (u"Pluggers", u"http://www.gocomics.com/pluggers"),
                       (u"Pooch Cafe", u"http://www.gocomics.com/poochcafe"),
                       (u"PreTeena", u"http://www.gocomics.com/preteena"),
                       (u"Prickly City", u"http://www.gocomics.com/pricklycity"),
                       (u"Rabbits Against Magic", u"http://www.gocomics.com/rabbitsagainstmagic"),
                       (u"Raising Duncan Classics", u"http://www.gocomics.com/raisingduncanclassics"),
                       (u"Real Life Adventures", u"http://www.gocomics.com/reallifeadventures"),
                       (u"Reality Check", u"http://www.gocomics.com/realitycheck"),
                       (u"Red and Rover", u"http://www.gocomics.com/redandrover"),
                       (u"Red Meat", u"http://www.gocomics.com/redmeat"),
                       (u"Reynolds Unwrapped", u"http://www.gocomics.com/reynoldsunwrapped"),
                       (u"Rip Haywire", u"http://www.gocomics.com/riphaywire"),
                       (u"Ripley's Believe It or Not!", u"http://www.gocomics.com/ripleysbelieveitornot"),
                       (u"Ronaldinho Gaucho", u"http://www.gocomics.com/ronaldinhogaucho"),
                       (u"Rose Is Rose", u"http://www.gocomics.com/roseisrose"),
                       (u"Rubes", u"http://www.gocomics.com/rubes"),
                       (u"Rudy Park", u"http://www.gocomics.com/rudypark"),
                       (u"Scary Gary", u"http://www.gocomics.com/scarygary"),
                       (u"Shirley and Son Classics", u"http://www.gocomics.com/shirleyandsonclassics"),
                       (u"Shoe", u"http://www.gocomics.com/shoe"),
                       (u"Shoecabbage", u"http://www.gocomics.com/shoecabbage"),
                       (u"Skin Horse", u"http://www.gocomics.com/skinhorse"),
                       (u"Slowpoke", u"http://www.gocomics.com/slowpoke"),
                       (u"Soup To Nutz", u"http://www.gocomics.com/souptonutz"),
                       (u"Speed Bump", u"http://www.gocomics.com/speedbump"),
                       (u"Spot The Frog", u"http://www.gocomics.com/spotthefrog"),
                       (u"State of the Union", u"http://www.gocomics.com/stateoftheunion"),
                       (u"Stone Soup", u"http://www.gocomics.com/stonesoup"),
                       (u"Strange Brew", u"http://www.gocomics.com/strangebrew"),
                       (u"Sylvia", u"http://www.gocomics.com/sylvia"),
                       (u"Tank McNamara", u"http://www.gocomics.com/tankmcnamara"),
                       (u"Tarzan Classics", u"http://www.gocomics.com/tarzanclassics"),
                       (u"That's Life", u"http://www.gocomics.com/thatslife"),
                       (u"The Academia Waltz", u"http://www.gocomics.com/academiawaltz"),
                       (u"The Argyle Sweater", u"http://www.gocomics.com/theargylesweater"),
                       (u"The Barn", u"http://www.gocomics.com/thebarn"),
                       (u"The Boiling Point", u"http://www.gocomics.com/theboilingpoint"),
                       (u"The Boondocks", u"http://www.gocomics.com/boondocks"),
                       (u"The Born Loser", u"http://www.gocomics.com/thebornloser"),
                       (u"The Buckets", u"http://www.gocomics.com/thebuckets"),
                       (u"The City", u"http://www.gocomics.com/thecity"),
                       (u"The Dinette Set", u"http://www.gocomics.com/dinetteset"),
                       (u"The Doozies", u"http://www.gocomics.com/thedoozies"),
                       (u"The Duplex", u"http://www.gocomics.com/duplex"),
                       (u"The Elderberries", u"http://www.gocomics.com/theelderberries"),
                       (u"The Flying McCoys", u"http://www.gocomics.com/theflyingmccoys"),
                       (u"The Fusco Brothers", u"http://www.gocomics.com/thefuscobrothers"),
                       (u"The Grizzwells", u"http://www.gocomics.com/thegrizzwells"),
                       (u"The Humble Stumble", u"http://www.gocomics.com/thehumblestumble"),
                       (u"The Knight Life", u"http://www.gocomics.com/theknightlife"),
                       (u"The Meaning of Lila", u"http://www.gocomics.com/meaningoflila"),
                       (u"The Middletons", u"http://www.gocomics.com/themiddletons"),
                       (u"The Norm", u"http://www.gocomics.com/thenorm"),
                       (u"The Other Coast", u"http://www.gocomics.com/theothercoast"),
                       (u"The Quigmans", u"http://www.gocomics.com/thequigmans"),
                       (u"The Sunshine Club", u"http://www.gocomics.com/thesunshineclub"),
                       (u"Tiny Sepuk", u"http://www.gocomics.com/tinysepuk"),
                       (u"TOBY", u"http://www.gocomics.com/toby"),
                       (u"Tom the Dancing Bug", u"http://www.gocomics.com/tomthedancingbug"),
                       (u"Too Much Coffee Man", u"http://www.gocomics.com/toomuchcoffeeman"),
                       (u"Unstrange Phenomena", u"http://www.gocomics.com/unstrangephenomena"),
                       (u"W.T. Duck", u"http://www.gocomics.com/wtduck"),
                       (u"Watch Your Head", u"http://www.gocomics.com/watchyourhead"),
                       (u"Wee Pals", u"http://www.gocomics.com/weepals"),
                       (u"Winnie the Pooh", u"http://www.gocomics.com/winniethepooh"),
                       (u"Wizard of Id", u"http://www.gocomics.com/wizardofid"),
                       (u"Working Daze", u"http://www.gocomics.com/workingdaze"),
                       (u"Working It Out", u"http://www.gocomics.com/workingitout"),
                       (u"Yenny", u"http://www.gocomics.com/yenny"),
                       (u"Zack Hill", u"http://www.gocomics.com/zackhill"),
                       (u"Ziggy", u"http://www.gocomics.com/ziggy"),
                       #
                       ######## EDITORIAL CARTOONS #####################
                       (u"Adam Zyglis", u"http://www.gocomics.com/adamzyglis"),
                       (u"Andy Singer", u"http://www.gocomics.com/andysinger"),
                       (u"Ben Sargent",u"http://www.gocomics.com/bensargent"),
                       (u"Bill Day", u"http://www.gocomics.com/billday"),
                       (u"Bill Schorr", u"http://www.gocomics.com/billschorr"),
                       (u"Bob Englehart", u"http://www.gocomics.com/bobenglehart"),
                       (u"Bob Gorrell",u"http://www.gocomics.com/bobgorrell"),
                       (u"Brian Fairrington", u"http://www.gocomics.com/brianfairrington"),
                       (u"Bruce Beattie", u"http://www.gocomics.com/brucebeattie"),
                       (u"Cam Cardow", u"http://www.gocomics.com/camcardow"),
                       (u"Chan Lowe",u"http://www.gocomics.com/chanlowe"),
                       (u"Chip Bok",u"http://www.gocomics.com/chipbok"),
                       (u"Chris Britt",u"http://www.gocomics.com/chrisbritt"),
                       (u"Chuck Asay",u"http://www.gocomics.com/chuckasay"),
                       (u"Clay Bennett",u"http://www.gocomics.com/claybennett"),
                       (u"Clay Jones",u"http://www.gocomics.com/clayjones"),
                       (u"Dan Wasserman",u"http://www.gocomics.com/danwasserman"),
                       (u"Dana Summers",u"http://www.gocomics.com/danasummers"),
                       (u"Daryl Cagle", u"http://www.gocomics.com/darylcagle"),
                       (u"David Fitzsimmons", u"http://www.gocomics.com/davidfitzsimmons"),
                       (u"Dick Locher",u"http://www.gocomics.com/dicklocher"),
                       (u"Don Wright",u"http://www.gocomics.com/donwright"),
                       (u"Donna Barstow",u"http://www.gocomics.com/donnabarstow"),
                       (u"Drew Litton", u"http://www.gocomics.com/drewlitton"),
                       (u"Drew Sheneman",u"http://www.gocomics.com/drewsheneman"),
                       (u"Ed Stein", u"http://www.gocomics.com/edstein"),
                       (u"Eric Allie", u"http://www.gocomics.com/ericallie"),
                       (u"Gary Markstein", u"http://www.gocomics.com/garymarkstein"),
                       (u"Gary McCoy", u"http://www.gocomics.com/garymccoy"),
                       (u"Gary Varvel", u"http://www.gocomics.com/garyvarvel"),
                       (u"Glenn McCoy",u"http://www.gocomics.com/glennmccoy"),
                       (u"Henry Payne", u"http://www.gocomics.com/henrypayne"),
                       (u"Jack Ohman",u"http://www.gocomics.com/jackohman"),
                       (u"JD Crowe", u"http://www.gocomics.com/jdcrowe"),
                       (u"Jeff Danziger",u"http://www.gocomics.com/jeffdanziger"),
                       (u"Jeff Parker", u"http://www.gocomics.com/jeffparker"),
                       (u"Jeff Stahler", u"http://www.gocomics.com/jeffstahler"),
                       (u"Jerry Holbert", u"http://www.gocomics.com/jerryholbert"),
                       (u"Jim Morin",u"http://www.gocomics.com/jimmorin"),
                       (u"Joel Pett",u"http://www.gocomics.com/joelpett"),
                       (u"John Cole", u"http://www.gocomics.com/johncole"),
                       (u"John Darkow", u"http://www.gocomics.com/johndarkow"),
                       (u"John Deering",u"http://www.gocomics.com/johndeering"),
                       (u"John Sherffius", u"http://www.gocomics.com/johnsherffius"),
                       (u"Ken Catalino",u"http://www.gocomics.com/kencatalino"),
                       (u"Kerry Waghorn",u"http://www.gocomics.com/facesinthenews"),
                       (u"Kevin Kallaugher",u"http://www.gocomics.com/kevinkallaugher"),
                       (u"Lalo Alcaraz",u"http://www.gocomics.com/laloalcaraz"),
                       (u"Larry Wright", u"http://www.gocomics.com/larrywright"),
                       (u"Lisa Benson", u"http://www.gocomics.com/lisabenson"),
                       (u"Marshall Ramsey", u"http://www.gocomics.com/marshallramsey"),
                       (u"Matt Bors", u"http://www.gocomics.com/mattbors"),
                       (u"Matt Davies",u"http://www.gocomics.com/mattdavies"),
                       (u"Michael Ramirez", u"http://www.gocomics.com/michaelramirez"),
                       (u"Mike Keefe", u"http://www.gocomics.com/mikekeefe"),
                       (u"Mike Luckovich", u"http://www.gocomics.com/mikeluckovich"),
                       (u"MIke Thompson", u"http://www.gocomics.com/mikethompson"),
                       (u"Monte Wolverton", u"http://www.gocomics.com/montewolverton"),
                       (u"Mr. Fish", u"http://www.gocomics.com/mrfish"),
                       (u"Nate Beeler", u"http://www.gocomics.com/natebeeler"),
                       (u"Nick Anderson", u"http://www.gocomics.com/nickanderson"),
                       (u"Pat Bagley", u"http://www.gocomics.com/patbagley"),
                       (u"Pat Oliphant",u"http://www.gocomics.com/patoliphant"),
                       (u"Paul Conrad",u"http://www.gocomics.com/paulconrad"),
                       (u"Paul Szep", u"http://www.gocomics.com/paulszep"),
                       (u"RJ Matson", u"http://www.gocomics.com/rjmatson"),
                       (u"Rob Rogers", u"http://www.gocomics.com/robrogers"),
                       (u"Robert Ariail", u"http://www.gocomics.com/robertariail"),
                       (u"Scott Stantis", u"http://www.gocomics.com/scottstantis"),
                       (u"Signe Wilkinson", u"http://www.gocomics.com/signewilkinson"),
                       (u"Small World",u"http://www.gocomics.com/smallworld"),
                       (u"Steve Benson", u"http://www.gocomics.com/stevebenson"),
                       (u"Steve Breen", u"http://www.gocomics.com/stevebreen"),
                       (u"Steve Kelley", u"http://www.gocomics.com/stevekelley"),
                       (u"Steve Sack", u"http://www.gocomics.com/stevesack"),
                       (u"Stuart Carlson",u"http://www.gocomics.com/stuartcarlson"),
                       (u"Ted Rall",u"http://www.gocomics.com/tedrall"),
                       (u"(Th)ink", u"http://www.gocomics.com/think"),
                       (u"Tom Toles",u"http://www.gocomics.com/tomtoles"),
                       (u"Tony Auth",u"http://www.gocomics.com/tonyauth"),
                       (u"Views of the World",u"http://www.gocomics.com/viewsoftheworld"),
                       (u"ViewsAfrica",u"http://www.gocomics.com/viewsafrica"),
                       (u"ViewsAmerica",u"http://www.gocomics.com/viewsamerica"),
                       (u"ViewsAsia",u"http://www.gocomics.com/viewsasia"),
                       (u"ViewsBusiness",u"http://www.gocomics.com/viewsbusiness"),
                       (u"ViewsEurope",u"http://www.gocomics.com/viewseurope"),
                       (u"ViewsLatinAmerica",u"http://www.gocomics.com/viewslatinamerica"),
                       (u"ViewsMidEast",u"http://www.gocomics.com/viewsmideast"),
                       (u"Walt Handelsman",u"http://www.gocomics.com/walthandelsman"),
                       (u"Wayne Stayskal",u"http://www.gocomics.com/waynestayskal"),
                       (u"Wit of the World",u"http://www.gocomics.com/witoftheworld"),
                             ]

        random.shuffle(feedis)
        feeds = feedis[0:20]

        for title, url in feeds:
            print 'Working on: ', title
            articles = self.make_links(url)
            if articles:
                feeds.append((title, articles))
        return feeds

    def make_links(self, url):
        title = 'Temp'
        current_articles = []
        pages = range(1, self.num_comics_to_get+1)
        for page in pages:
            page_soup = self.index_to_soup(url)
            if page_soup:
                try:
                  strip_title = page_soup.find(name='div', attrs={'class':'top'}).h1.a.string
                except:
                  strip_title = 'Error - no Title found'
                try:
                  date_title = page_soup.find('ul', attrs={'class': 'feature-nav'}).li.string
                  if not date_title:
                      date_title = page_soup.find('ul', attrs={'class': 'feature-nav'}).li.string
                except:
                  date_title = 'Error - no Date found'
                title = strip_title + ' - ' + date_title
                for i in range(2):
                  try:
                    strip_url_date = page_soup.find(name='div', attrs={'class':'top'}).h1.a['href']
                    break #success - this is normal exit
                  except:
                    strip_url_date = None
                    continue #try to get strip_url_date again
                for i in range(2):
                  try:
                    prev_strip_url_date = page_soup.find('a', attrs={'class': 'prev'})['href']
                    break #success - this is normal exit
                  except:
                    prev_strip_url_date = None
                    continue #try to get prev_strip_url_date again
                if strip_url_date:
                  page_url = 'http://www.gocomics.com' + strip_url_date
                else:
                  continue
                if prev_strip_url_date:
                  prev_page_url = 'http://www.gocomics.com' + prev_strip_url_date
                else:
                  continue
            current_articles.append({'title': title, 'url': page_url, 'description':'', 'date':''})
            url = prev_page_url
        current_articles.reverse()
        return current_articles

    def preprocess_html(self, soup):
        if soup.title:
            title_string = soup.title.string.strip()
            _cd = title_string.split(',',1)[1]
            comic_date = ' '.join(_cd.split(' ', 4)[0:-1])
        if soup.h1.span:
            artist = soup.h1.span.string
            soup.h1.span.string.replaceWith(comic_date + artist)
        feature_item = soup.find('p',attrs={'class':'feature_item'})
        if feature_item.a:
            a_tag = feature_item.a
            a_href = a_tag["href"]
            img_tag = a_tag.img
            img_tag["src"] = a_href
            img_tag["width"] = self.comic_size
            img_tag["height"] = None
        return self.adeify_images(soup)

    extra_css = '''
                    h1{font-family:Arial,Helvetica,sans-serif; font-weight:bold;font-size:large;}
                    h2{font-family:Arial,Helvetica,sans-serif; font-weight:normal;font-size:small;}
                    img {max-width:100%; min-width:100%;}
                    p{font-family:Arial,Helvetica,sans-serif;font-size:small;}
                    body{font-family:Helvetica,Arial,sans-serif;font-size:small;}
		'''
So, I create an Array (feedis) with all the urls
Then excute a simple shuffle and chop it to 2 elements (for this test)

And this is part of the log...

Code:
InputFormatPlugin: Recipe Input running
Working on:  Chan Lowe

Working on:  Skin Horse

Working on:  Chan Lowe
Python function terminated unexpectedly
  expected string or buffer (Error Code: 1)
Traceback (most recent call last):
  File "site.py", line 132, in main
  File "site.py", line 109, in run_entry_point
  File "site-packages\calibre\utils\ipc\worker.py", line 187, in main
  File "site-packages\calibre\gui2\convert\gui_conversion.py", line 25, in gui_convert
  File "site-packages\calibre\ebooks\conversion\plumber.py", line 959, in run
  File "site-packages\calibre\customize\conversion.py", line 204, in __call__
  File "site-packages\calibre\web\feeds\input.py", line 105, in convert
  File "site-packages\calibre\web\feeds\news.py", line 824, in download
  File "site-packages\calibre\web\feeds\news.py", line 968, in build_index
  File "c:\users\-----\appdata\local\temp\calibre_0.8.27_tmp_8cn8y4\md5n3m_recipes\recipe0.py", line 380, in parse_index
    articles = self.make_links(url)
  File "c:\users\----\appdata\local\temp\calibre_0.8.27_tmp_8cn8y4\md5n3m_recipes\recipe0.py", line 390, in make_links
    page_soup = self.index_to_soup(url)
  File "site-packages\calibre\web\feeds\news.py", line 536, in index_to_soup
  File "re.py", line 137, in match
TypeError: expected string or buffer

I don't understand were to look (or to fix) in the last line.
I suspect that maybe some loop is missing (to properly check the array)

Any help would be great.
Thanks.
ppriede is offline   Reply With Quote
Old 11-29-2011, 07:09 AM   #2
NotTaken
Connoisseur
NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.
 
Posts: 65
Karma: 4640
Join Date: Aug 2011
Device: kindle
Probably not wise to modify the object you are iterating on, plus your feeds list would be badly formed as you are adding tuples of two different forms. I couldn't test but is it any better if you use this code fragment?

Code:
        random.shuffle(feedis)
        feedis = feedis[0:20]
        feeds = []
        
        for title, url in feedis:
            print 'Working on: ', title
            articles = self.make_links(url)
            if articles:
                feeds.append((title, articles))
        return feeds

Last edited by NotTaken; 11-29-2011 at 07:19 AM.
NotTaken is offline   Reply With Quote
Advert
Old 11-29-2011, 11:36 AM   #3
ppriede
Junior Member
ppriede began at the beginning.
 
Posts: 6
Karma: 10
Join Date: May 2010
Location: Chile
Device: Kindle 3G, Kindle Fire (soon)
Talking Thanks!

IT WORKS!
thank you very much NotTaken

Here is the Random GoComics.com Recipe ILTR20RCATEOTD Edition (AKA I Like To Read 20 Random Comics At The End Of The Day Edition)
Attached Files
File Type: txt Random GoComics.com.txt (30.5 KB, 239 views)
ppriede is offline   Reply With Quote
Old 11-29-2011, 12:10 PM   #4
NotTaken
Connoisseur
NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.
 
Posts: 65
Karma: 4640
Join Date: Aug 2011
Device: kindle
Thanks for sharing. One thing I did notice is that a few of the urls aren't current and it ends up getting the comic on the front page. A couple that I noticed:
Code:
(u"Larry Wright", u"http://www.gocomics.com/larrywright"),
(u"Grand Avenue", u"http://www.gocomics.com/grandavenue"),
I believe Grand Avenue should be,
Code:
 (u"Grand Avenue", u"http://www.gocomics.com/grand-avenue")
If they change often it might be nice to parse a page of links to populate your feeds.
NotTaken is offline   Reply With Quote
Old 12-02-2011, 10:07 PM   #5
ppriede
Junior Member
ppriede began at the beginning.
 
Posts: 6
Karma: 10
Join Date: May 2010
Location: Chile
Device: Kindle 3G, Kindle Fire (soon)
Nice idea NotTaken

Well, now i found a simple way to get the URL from updated comics from GoComics.com

With Dapper
http://open.dapper.net/transform.php...com%2Ffeatures

or with Feed43.com (only loads 100KB of the HTML, but is something)
http://www.feed43.com/random_gocomics.xml

With Dapper, only show the updated comics and with Feed43 in the content is a "Status: " that show if is updated or not (updated or empty)


BUT!
Now, i'm really lost..

I know i have to download the RSS first, then shuffle it, and the chop it (to 20 elements for example), and then pass it to the GoComics Recipe to clean it..

I try to read the the tutorial on Recipes, but really don't find a good example to understand the way the fetching is done.

Is possible to shuffle a RSS an then pass it to the other functions of the GoComics Recipe? (make_links(self, url), preprocess_html(self, soup))

I think the parse_index(self) is no longer needed.. or it is?


any information would be appreciated
Thanks.
ppriede is offline   Reply With Quote
Advert
Old 12-03-2011, 09:39 PM   #6
NotTaken
Connoisseur
NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.
 
Posts: 65
Karma: 4640
Join Date: Aug 2011
Device: kindle
Could do something like this:

Code:
#!/usr/bin/env  python



__license__   = 'GPL v3'

__copyright__ = 'Copyright 2010 Starson17'

'''

www.gocomics.com

'''

from calibre.web.feeds.news import BasicNewsRecipe

import mechanize, re, random
from calibre.ebooks.BeautifulSoup import BeautifulSoup, Tag, NavigableString



class GoComicsRandom(BasicNewsRecipe):

    title               = 'Random GoComics.com'

    __author__          = 'ppriede - based on Starson17, with the help of NotTaken'

    __version__         = '1.07'

    __date__            = '29 November 2011'

    description         = u'200+ Comics - Customize for more days/comics: Defaults to 7 days, 25 comics - 20 general, 5 editorial.'

    category            = 'news, comics'

    language            = 'en'

    use_embedded_content= False

    no_stylesheets      = True

    remove_javascript   = True

    cover_url           = 'http://assets.gocomics.com/images/logo-uclick-gocomics-stacked.png'

    remove_attributes = ['style']
    



####### USER PREFERENCES - COMICS, IMAGE SIZE AND NUMBER OF COMICS TO RETRIEVE ########
    # num_comics_to_get - I've tried up to 99 on Calvin&Hobbes
    num_comics_to_get = 1
    # comic_size 300 is small, 600 is medium, 900 is large, 1500 is extra-large
    comic_size = 900
    # CHOOSE COMIC STRIPS BELOW - REMOVE COMMENT '# ' FROM IN FRONT OF DESIRED STRIPS
    # Please do not overload their servers by selecting all comics and 1000 strips from each!

    conversion_options = {'linearize_tables'  : True
                        , 'comment'           : description
                        , 'tags'              : category
                        , 'language'          : language
                        }

    keep_only_tags     = [dict(name='div', attrs={'class':['feature','banner']}),
                          ]

    remove_tags = [dict(name='a', attrs={'class':['beginning','prev','cal','next','newest']}),
                   dict(name='div', attrs={'class':['tag-wrapper']}),
                   dict(name='a', attrs={'href':re.compile(r'.*mutable_[0-9]+', re.IGNORECASE)}),
                   dict(name='img', attrs={'src':re.compile(r'.*mutable_[0-9]+', re.IGNORECASE)}),
                   dict(name='ul', attrs={'class':['share-nav','feature-nav']}),
                   ]

    def get_browser(self):
        br = BasicNewsRecipe.get_browser(self)
        cookies = mechanize.CookieJar()
        br = mechanize.build_opener(mechanize.HTTPCookieProcessor(cookies))
        br.addheaders = [('Referer','http://www.gocomics.com/')]
        return br

    def parse_index(self):
        
        feedis = []
        
        soup = self.index_to_soup("http://open.dapper.net/transform.php?dappName=GoComicscomupdated&transformer=RSS&extraArg_title=Title&extraArg_fixDates=1&applyToUrl=http%3A%2F%2Fwww.gocomics.com%2Ffeatures")
        for item in soup.findAll('item'):
            title = None
            url = None
            for contents in item:
                if isinstance(contents,NavigableString):
                    url = contents
                    break
            try:
                title = item.title.contents[0]
            except:
                continue                
            if title and url:
                feedis.append((title,url))       
        
        random.shuffle(feedis)
        feedis = feedis[0:20]
        feeds = []
        
        for title, url in feedis:
            print 'Working on: ', title
            articles = self.make_links(url)
            if articles:
                feeds.append((title, articles))
        return feeds

    def make_links(self, url):
        title = 'Temp'
        current_articles = []
        pages = range(1, self.num_comics_to_get+1)
        for page in pages:
            page_soup = self.index_to_soup(url)
            if page_soup:
                try:
                  strip_title = page_soup.find(name='div', attrs={'class':'top'}).h1.a.string
                except:
                  strip_title = 'Error - no Title found'
                try:
                  date_title = page_soup.find('ul', attrs={'class': 'feature-nav'}).li.string
                  if not date_title:
                      date_title = page_soup.find('ul', attrs={'class': 'feature-nav'}).li.string
                except:
                  date_title = 'Error - no Date found'
                title = strip_title + ' - ' + date_title
                for i in range(2):
                  try:
                    strip_url_date = page_soup.find(name='div', attrs={'class':'top'}).h1.a['href']
                    break #success - this is normal exit
                  except:
                    strip_url_date = None
                    continue #try to get strip_url_date again
                for i in range(2):
                  try:
                    prev_strip_url_date = page_soup.find('a', attrs={'class': 'prev'})['href']
                    break #success - this is normal exit
                  except:
                    prev_strip_url_date = None
                    continue #try to get prev_strip_url_date again
                if strip_url_date:
                  page_url = 'http://www.gocomics.com' + strip_url_date
                else:
                  continue
                if prev_strip_url_date:
                  prev_page_url = 'http://www.gocomics.com' + prev_strip_url_date
                else:
                  continue
            current_articles.append({'title': title, 'url': page_url, 'description':'', 'date':''})
            url = prev_page_url
        current_articles.reverse()
        return current_articles

    def preprocess_html(self, soup):
        if soup.title:
            title_string = soup.title.string.strip()
            _cd = title_string.split(',',1)[1]
            comic_date = ' '.join(_cd.split(' ', 4)[0:-1])
        if soup.h1.span:
            artist = soup.h1.span.string
            soup.h1.span.string.replaceWith(comic_date + artist)
        feature_item = soup.find('p',attrs={'class':'feature_item'})
        if feature_item.a:
            a_tag = feature_item.a
            a_href = a_tag["href"]
            img_tag = a_tag.img
            img_tag["src"] = a_href
            img_tag["width"] = self.comic_size
            img_tag["height"] = None
        return self.adeify_images(soup)

    extra_css = '''
                    h1{font-family:Arial,Helvetica,sans-serif; font-weight:bold;font-size:large;}
                    h2{font-family:Arial,Helvetica,sans-serif; font-weight:normal;font-size:small;}
                    img {max-width:100%; min-width:100%;}
                    p{font-family:Arial,Helvetica,sans-serif;font-size:small;}
                    body{font-family:Helvetica,Arial,sans-serif;font-size:small;}
		'''
This allows you to make use of Starson's user preferences or you could just shuffle and cut down the feeds in parse_feeds.

Last edited by NotTaken; 12-03-2011 at 09:53 PM.
NotTaken is offline   Reply With Quote
Reply

Tags
gocomics.com, random


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
GoComics After Merger - New "Feeds" ListWith the ensuing merger of GoComics and Comic BRGriff Recipes 7 07-25-2012 05:23 PM
GoComics for Kobo Touch Odyseus Recipes 1 09-08-2011 06:03 PM
GoComics/Comics.com Merger BRGriff News 5 06-05-2011 12:16 PM
Fixed: GoComics.com Starson17 Recipes 1 01-02-2011 01:21 PM
Random page breaks and random subscripts? sark666 Kobo Reader 2 09-04-2010 02:25 AM


All times are GMT -4. The time now is 12:56 PM.


MobileRead.com is a privately owned, operated and funded community.