Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Closed Thread
 
Thread Tools Search this Thread
Old 08-13-2010, 02:01 PM   #2431
hrickh
Enthusiast
hrickh began at the beginning.
 
Posts: 49
Karma: 10
Join Date: Aug 2010
Device: Nokia N800, EeePC 4G Surf
Quote:
Originally Posted by kiklop74 View Post
Thanks.

Looks like I'm going to have to find a way to upgrade Calibre in any case. When I try to add the recipe, I get a "You must not use 8-bit bytestrings..." error, which, from what I can tell, was a bug that's been fixed with later versions of Calibre.

Again, thanks in any case.

R.
==
hrickh is offline  
Old 08-13-2010, 02:05 PM   #2432
kiklop74
Guru
kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.
 
kiklop74's Avatar
 
Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
Until you do that try this modified version
Attached Files
File Type: zip elpais_impreso_mod.zip (4.7 KB, 253 views)
kiklop74 is offline  
Advert
Old 08-13-2010, 02:53 PM   #2433
hrickh
Enthusiast
hrickh began at the beginning.
 
Posts: 49
Karma: 10
Join Date: Aug 2010
Device: Nokia N800, EeePC 4G Surf
Quote:
Originally Posted by kiklop74 View Post
Until you do that try this modified version
Cool, thanks!

It added fine. Running now.

R.
==
hrickh is offline  
Old 08-13-2010, 05:38 PM   #2434
miangue
Junior Member
miangue began at the beginning.
 
miangue's Avatar
 
Posts: 4
Karma: 10
Join Date: Aug 2010
Location: Colombia
Device: Sony PRS-300
Please someone could help me with the recipe from "El Espectador" (http:www.elespectador.com), a newspaper of Colombia that I could create for my Sony Reader prs-300.

THANK YOU !!!
miangue is offline  
Old 08-13-2010, 05:53 PM   #2435
cisaak
Member
cisaak began at the beginning.
 
Posts: 17
Karma: 10
Join Date: Aug 2010
Device: Kindle DX
Recipe Help

Trying to make recipe for local newspaper. Want only three items: headline, byline, and story. I can include the story by using keep_only_tags command with div "blox-story-text."

The tag for the headline is found in <h1> of div id="blox-story." The tag for the byline is found in <p class="byline"> of div id="blox-story." Including the entire "blox-story" produces a lot of unwanted material. How can I target just the headline and byline?
cisaak is offline  
Advert
Old 08-13-2010, 06:29 PM   #2436
jordanmills
Junior Member
jordanmills began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Aug 2010
Device: nook
Can anyone make a recipe for texasmonthly.com? It's kind of convoluted...

It might be subscriber only so I'll post some examples.

RSS feed:
http://feeds.feedburner.com/texasmonthlycurrent

example article url:
http://feedproxy.google.com/~r/texas...Q/webextra.php

The article URL is always in the form of "http://feedproxy.google.com/~r/texasmonthlycurrent/~3/" + randomkey + "/" + pagename

article url directs to:
http://www.texasmonthly.com/2010-08-...as+Monthly%29#

Everything after the ? can be ignored, so it's in the form of:
"http://www.texasmonthly.com/" + currentyear + "-" + currentmonth + "-01" + "/" + pagename

example print view url:
http://www.texasmonthly.com/cms/prin...sue=2010-08-01

That's always in the form of
"http://www.texasmonthly.com/cms/printthis.php?file=" + pagename + "&issue=" + currentyear + "-" + currentmonth + "-01"


The names and number of pages may vary so it has to be pulled from the RSS feed in realtime. I assume printthis.php can pull anything from the past, but we're only worried about current articles so we can use the current year and month. Everything publishes on the first, os the day is always 01. The logic seems simple enough, I just don't know enough about the coding to implement it.

Last edited by jordanmills; 08-13-2010 at 06:31 PM.
jordanmills is offline  
Old 08-13-2010, 08:02 PM   #2437
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by cisaak View Post
Trying to make recipe for local newspaper. Want only three items: headline, byline, and story. I can include the story by using keep_only_tags command with div "blox-story-text."

The tag for the headline is found in <h1> of div id="blox-story." The tag for the byline is found in <p class="byline"> of div id="blox-story." Including the entire "blox-story" produces a lot of unwanted material. How can I target just the headline and byline?
You haven't given enough info, but you can try adding this to the keep_only_tags:
Code:
dict(name='h1'), dict(name='p', attrs={'class':'byline'})
(Order is important inside the keep_only.)

Last edited by Starson17; 08-13-2010 at 08:05 PM.
Starson17 is offline  
Old 08-13-2010, 08:58 PM   #2438
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by jordanmills View Post
Can anyone make a recipe for texasmonthly.com? It's kind of convoluted...
I won't do it for you, but there are two approaches:

1) treat it as an obfuscated link to the print version - read here
2) treat it as a normal feed, and keep_only or delete whatever you want to keep/delete.

I'd go for #2

Edit: BTW, I did look at your links pretty closely. I even wrote a quick recipe. Using your RSS link as a normal feed works fine, despite the random number, but the print link(name of article) is being obfuscated with a redirect. If you really want to use the print link, you'll need to code through the obfuscation with #1 to get the article name. Then you'd need to code up or scrape the year/month to build the link you want to the print version.

However, I seldom use print links anyway. It's just as easy, and often more fun, to use keep_only and remove tags to keep/remove what you want from the non-print page, which is why I suggested #2.

Last edited by Starson17; 08-14-2010 at 01:06 PM.
Starson17 is offline  
Old 08-14-2010, 01:49 AM   #2439
TonytheBookworm
Addict
TonytheBookworm is on a distinguished road
 
TonytheBookworm's Avatar
 
Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
Anyone mind making a recipe for http://boortz.com/nealz_nuze/index.html please? The kindle version outright sucks. I subscribed to it and for whatever reason it wraps a lot of the "emails he gets" inside a table and the table will not pan even though it says move the controller around and it will So I was wondering if some of the fantastic folks over here would be so kind to make a recipe for me. THANK YOU THANK YOU!!!
TonytheBookworm is offline  
Old 08-14-2010, 10:13 AM   #2440
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
New Recipe: GoComics

New Recipe: GoComics

200+ comics (defaults to 7 days for 25 comics - 20 general, 5 editorial). Size is adjustable. A companion to the comics.com recipe.
Spoiler:

Code:
#!/usr/bin/env  python

__license__   = 'GPL v3'
__copyright__ = 'Copyright 2010 Starson17'
'''
www.gocomics.com
'''
from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import BeautifulSoup
import re, mechanize

class GoComics(BasicNewsRecipe):
    title               = 'GoComics'
    __author__          = 'Starson17' 
    __version__         = '1.02'
    __date__            = '14 August 2010'
    description         = u'200+ Comics - Customize for more days/comics: Defaults to 7 days, 25 comics - 20 general, 5 editorial.'
    category            = 'news, comics'
    language            = 'en'
    use_embedded_content= False
    no_stylesheets      = True
    remove_javascript   = True
    cover_url           = 'http://paulbuckley14059.files.wordpress.com/2008/06/calvin-and-hobbes.jpg'

    ####### USER PREFERENCES - COMICS, IMAGE SIZE AND NUMBER OF COMICS TO RETRIEVE ########
    # num_comics_to_get - I've tried up to 99 on Calvin&Hobbes
    num_comics_to_get = 7
    # comic_size 300 is small, 600 is medium, 900 is large, 1500 is extra-large
    comic_size = 900
    # CHOOSE COMIC STRIPS BELOW - REMOVE COMMENT '# ' FROM IN FRONT OF DESIRED STRIPS 
    # Please do not overload their servers by selecting all comics and 1000 strips from each!
    
    conversion_options = {'linearize_tables'  : True
                        , 'comment'           : description
                        , 'tags'              : category
                        , 'language'          : language
                        }

    keep_only_tags     = [dict(name='div', attrs={'class':['feature','banner']}),
                          ]

    remove_tags = [dict(name='a', attrs={'class':['beginning','prev','cal','next','newest']}),
                   dict(name='div', attrs={'class':['tag-wrapper']}),
                   dict(name='ul', attrs={'class':['share-nav','feature-nav']}),
                   ]

    def get_browser(self):
        br = BasicNewsRecipe.get_browser(self)
        cookies = mechanize.CookieJar()
        br = mechanize.build_opener(mechanize.HTTPCookieProcessor(cookies))
        br.addheaders = [('Referer','http://www.gocomics.com/')]
        return br
    
    def parse_index(self):
        feeds = []
        for title, url in [
                            ######## COMICS - GENERAL ########
                            (u"2 Cows and a Chicken", u"http://www.gocomics.com/2cowsandachicken"),
                            # (u"9 to 5", u"http://www.gocomics.com/9to5"),
                            # (u"The Academia Waltz", u"http://www.gocomics.com/academiawaltz"),
                            # (u"Adam@Home", u"http://www.gocomics.com/adamathome"),
                            # (u"Agnes", u"http://www.gocomics.com/agnes"),
                            # (u"Andy Capp", u"http://www.gocomics.com/andycapp"),
                            # (u"Animal Crackers", u"http://www.gocomics.com/animalcrackers"),
                            # (u"Annie", u"http://www.gocomics.com/annie"),
                            (u"The Argyle Sweater", u"http://www.gocomics.com/theargylesweater"),
                            # (u"Ask Shagg", u"http://www.gocomics.com/askshagg"),
                            (u"B.C.", u"http://www.gocomics.com/bc"),
                            # (u"Back in the Day", u"http://www.gocomics.com/backintheday"),
                            # (u"Bad Reporter", u"http://www.gocomics.com/badreporter"),
                            # (u"Baldo", u"http://www.gocomics.com/baldo"),
                            # (u"Ballard Street", u"http://www.gocomics.com/ballardstreet"),
                            # (u"Barkeater Lake", u"http://www.gocomics.com/barkeaterlake"),
                            # (u"The Barn", u"http://www.gocomics.com/thebarn"),
                            # (u"Basic Instructions", u"http://www.gocomics.com/basicinstructions"),
                            # (u"Bewley", u"http://www.gocomics.com/bewley"),
                            # (u"Big Top", u"http://www.gocomics.com/bigtop"),
                            # (u"Biographic", u"http://www.gocomics.com/biographic"),
                            (u"Birdbrains", u"http://www.gocomics.com/birdbrains"),
                            # (u"Bleeker: The Rechargeable Dog", u"http://www.gocomics.com/bleeker"),
                            # (u"Bliss", u"http://www.gocomics.com/bliss"),
                            (u"Bloom County", u"http://www.gocomics.com/bloomcounty"),
                            # (u"Bo Nanas", u"http://www.gocomics.com/bonanas"),
                            # (u"Bob the Squirrel", u"http://www.gocomics.com/bobthesquirrel"),
                            # (u"The Boiling Point", u"http://www.gocomics.com/theboilingpoint"),
                            # (u"Boomerangs", u"http://www.gocomics.com/boomerangs"),
                            # (u"The Boondocks", u"http://www.gocomics.com/boondocks"),
                            # (u"Bottomliners", u"http://www.gocomics.com/bottomliners"),
                            # (u"Bound and Gagged", u"http://www.gocomics.com/boundandgagged"),
                            # (u"Brainwaves", u"http://www.gocomics.com/brainwaves"),
                            # (u"Brenda Starr", u"http://www.gocomics.com/brendastarr"),
                            # (u"Brewster Rockit", u"http://www.gocomics.com/brewsterrockit"),
                            # (u"Broom Hilda", u"http://www.gocomics.com/broomhilda"),
                            (u"Calvin and Hobbes", u"http://www.gocomics.com/calvinandhobbes"),
                            # (u"Candorville", u"http://www.gocomics.com/candorville"),
                            # (u"Cathy", u"http://www.gocomics.com/cathy"),
                            # (u"C'est la Vie", u"http://www.gocomics.com/cestlavie"),
                            # (u"Chuckle Bros", u"http://www.gocomics.com/chucklebros"),
                            # (u"Citizen Dog", u"http://www.gocomics.com/citizendog"),
                            # (u"The City", u"http://www.gocomics.com/thecity"),
                            # (u"Cleats", u"http://www.gocomics.com/cleats"),
                            # (u"Close to Home", u"http://www.gocomics.com/closetohome"),
                            # (u"Compu-toon", u"http://www.gocomics.com/compu-toon"),
                            # (u"Cornered", u"http://www.gocomics.com/cornered"),
                            (u"Cul de Sac", u"http://www.gocomics.com/culdesac"),
                            # (u"Daddy's Home", u"http://www.gocomics.com/daddyshome"),
                            # (u"Deep Cover", u"http://www.gocomics.com/deepcover"),
                            # (u"Dick Tracy", u"http://www.gocomics.com/dicktracy"),
                            # (u"The Dinette Set", u"http://www.gocomics.com/dinetteset"),
                            # (u"Dog Eat Doug", u"http://www.gocomics.com/dogeatdoug"),
                            # (u"Domestic Abuse", u"http://www.gocomics.com/domesticabuse"),
                            # (u"Doodles", u"http://www.gocomics.com/doodles"),
                            (u"Doonesbury", u"http://www.gocomics.com/doonesbury"),
                            # (u"The Doozies", u"http://www.gocomics.com/thedoozies"),
                            # (u"The Duplex", u"http://www.gocomics.com/duplex"),
                            # (u"Eek!", u"http://www.gocomics.com/eek"),
                            # (u"The Elderberries", u"http://www.gocomics.com/theelderberries"),
                            # (u"Flight Deck", u"http://www.gocomics.com/flightdeck"),
                            # (u"Flo and Friends", u"http://www.gocomics.com/floandfriends"),
                            # (u"The Flying McCoys", u"http://www.gocomics.com/theflyingmccoys"),
                            (u"For Better or For Worse", u"http://www.gocomics.com/forbetterorforworse"),
                            # (u"For Heaven's Sake", u"http://www.gocomics.com/forheavenssake"),
                            # (u"Fort Knox", u"http://www.gocomics.com/fortknox"),
                            # (u"FoxTrot", u"http://www.gocomics.com/foxtrot"),
                            (u"FoxTrot Classics", u"http://www.gocomics.com/foxtrotclassics"),
                            # (u"Frank & Ernest", u"http://www.gocomics.com/frankandernest"),
                            # (u"Fred Basset", u"http://www.gocomics.com/fredbasset"),
                            # (u"Free Range", u"http://www.gocomics.com/freerange"),
                            # (u"Frog Applause", u"http://www.gocomics.com/frogapplause"),
                            # (u"The Fusco Brothers", u"http://www.gocomics.com/thefuscobrothers"),
                            (u"Garfield", u"http://www.gocomics.com/garfield"),
                            # (u"Garfield Minus Garfield", u"http://www.gocomics.com/garfieldminusgarfield"),
                            # (u"Gasoline Alley", u"http://www.gocomics.com/gasolinealley"),
                            # (u"Gil Thorp", u"http://www.gocomics.com/gilthorp"),
                            # (u"Ginger Meggs", u"http://www.gocomics.com/gingermeggs"),
                            # (u"Girls & Sports", u"http://www.gocomics.com/girlsandsports"),
                            # (u"Haiku Ewe", u"http://www.gocomics.com/haikuewe"),
                            # (u"Heart of the City", u"http://www.gocomics.com/heartofthecity"),
                            # (u"Heathcliff", u"http://www.gocomics.com/heathcliff"),
                            # (u"Herb and Jamaal", u"http://www.gocomics.com/herbandjamaal"),
                            # (u"Home and Away", u"http://www.gocomics.com/homeandaway"),
                            # (u"Housebroken", u"http://www.gocomics.com/housebroken"),
                            # (u"Hubert and Abby", u"http://www.gocomics.com/hubertandabby"),
                            # (u"Imagine This", u"http://www.gocomics.com/imaginethis"),
                            # (u"In the Bleachers", u"http://www.gocomics.com/inthebleachers"),
                            # (u"In the Sticks", u"http://www.gocomics.com/inthesticks"),
                            # (u"Ink Pen", u"http://www.gocomics.com/inkpen"),
                            # (u"It's All About You", u"http://www.gocomics.com/itsallaboutyou"),
                            # (u"Joe Vanilla", u"http://www.gocomics.com/joevanilla"),
                            # (u"La Cucaracha", u"http://www.gocomics.com/lacucaracha"),
                            # (u"Last Kiss", u"http://www.gocomics.com/lastkiss"),
                            # (u"Legend of Bill", u"http://www.gocomics.com/legendofbill"),
                            # (u"Liberty Meadows", u"http://www.gocomics.com/libertymeadows"),
                            (u"Lio", u"http://www.gocomics.com/lio"),
                            # (u"Little Dog Lost", u"http://www.gocomics.com/littledoglost"),
                            # (u"Little Otto", u"http://www.gocomics.com/littleotto"),
                            # (u"Loose Parts", u"http://www.gocomics.com/looseparts"),
                            # (u"Love Is...", u"http://www.gocomics.com/loveis"),
                            # (u"Maintaining", u"http://www.gocomics.com/maintaining"),
                            # (u"The Meaning of Lila", u"http://www.gocomics.com/meaningoflila"),
                            # (u"Middle-Aged White Guy", u"http://www.gocomics.com/middleagedwhiteguy"),
                            # (u"The Middletons", u"http://www.gocomics.com/themiddletons"),
                            # (u"Momma", u"http://www.gocomics.com/momma"),
                            # (u"Mutt & Jeff", u"http://www.gocomics.com/muttandjeff"),
                            # (u"Mythtickle", u"http://www.gocomics.com/mythtickle"),
                            # (u"Nest Heads", u"http://www.gocomics.com/nestheads"),
                            # (u"NEUROTICA", u"http://www.gocomics.com/neurotica"),
                            (u"New Adventures of Queen Victoria", u"http://www.gocomics.com/thenewadventuresofqueenvictoria"),
                            (u"Non Sequitur", u"http://www.gocomics.com/nonsequitur"),
                            # (u"The Norm", u"http://www.gocomics.com/thenorm"),
                            # (u"On A Claire Day", u"http://www.gocomics.com/onaclaireday"),
                            # (u"One Big Happy", u"http://www.gocomics.com/onebighappy"),
                            # (u"The Other Coast", u"http://www.gocomics.com/theothercoast"),
                            # (u"Out of the Gene Pool Re-Runs", u"http://www.gocomics.com/outofthegenepool"),
                            # (u"Overboard", u"http://www.gocomics.com/overboard"),
                            # (u"Pibgorn", u"http://www.gocomics.com/pibgorn"),
                            # (u"Pibgorn Sketches", u"http://www.gocomics.com/pibgornsketches"),
                            (u"Pickles", u"http://www.gocomics.com/pickles"),
                            # (u"Pinkerton", u"http://www.gocomics.com/pinkerton"),
                            # (u"Pluggers", u"http://www.gocomics.com/pluggers"),
                            (u"Pooch Cafe", u"http://www.gocomics.com/poochcafe"),
                            # (u"PreTeena", u"http://www.gocomics.com/preteena"),
                            # (u"The Quigmans", u"http://www.gocomics.com/thequigmans"),
                            # (u"Rabbits Against Magic", u"http://www.gocomics.com/rabbitsagainstmagic"),
                            (u"Real Life Adventures", u"http://www.gocomics.com/reallifeadventures"),
                            # (u"Red and Rover", u"http://www.gocomics.com/redandrover"),
                            # (u"Red Meat", u"http://www.gocomics.com/redmeat"),
                            # (u"Reynolds Unwrapped", u"http://www.gocomics.com/reynoldsunwrapped"),
                            # (u"Ronaldinho Gaucho", u"http://www.gocomics.com/ronaldinhogaucho"),
                            # (u"Rubes", u"http://www.gocomics.com/rubes"),
                            # (u"Scary Gary", u"http://www.gocomics.com/scarygary"),
                            (u"Shoe", u"http://www.gocomics.com/shoe"),
                            # (u"Shoecabbage", u"http://www.gocomics.com/shoecabbage"),
                            # (u"Skin Horse", u"http://www.gocomics.com/skinhorse"),
                            # (u"Slowpoke", u"http://www.gocomics.com/slowpoke"),
                            # (u"Speed Bump", u"http://www.gocomics.com/speedbump"),
                            # (u"State of the Union", u"http://www.gocomics.com/stateoftheunion"),
                            (u"Stone Soup", u"http://www.gocomics.com/stonesoup"),
                            # (u"Strange Brew", u"http://www.gocomics.com/strangebrew"),
                            # (u"Sylvia", u"http://www.gocomics.com/sylvia"),
                            # (u"Tank McNamara", u"http://www.gocomics.com/tankmcnamara"),
                            # (u"Tiny Sepuku", u"http://www.gocomics.com/tinysepuku"),
                            # (u"TOBY", u"http://www.gocomics.com/toby"),
                            # (u"Tom the Dancing Bug", u"http://www.gocomics.com/tomthedancingbug"),
                            # (u"Too Much Coffee Man", u"http://www.gocomics.com/toomuchcoffeeman"),
                            # (u"W.T. Duck", u"http://www.gocomics.com/wtduck"),
                            # (u"Watch Your Head", u"http://www.gocomics.com/watchyourhead"),
                            # (u"Wee Pals", u"http://www.gocomics.com/weepals"),
                            # (u"Winnie the Pooh", u"http://www.gocomics.com/winniethepooh"),
                            (u"Wizard of Id", u"http://www.gocomics.com/wizardofid"),
                            # (u"Working It Out", u"http://www.gocomics.com/workingitout"),
                            # (u"Yenny", u"http://www.gocomics.com/yenny"),
                            # (u"Zack Hill", u"http://www.gocomics.com/zackhill"),
                            (u"Ziggy", u"http://www.gocomics.com/ziggy"),
                            ######## COMICS - EDITORIAL ########
                            ("Lalo Alcaraz","http://www.gocomics.com/laloalcaraz"),
                            ("Nick Anderson","http://www.gocomics.com/nickanderson"),
                            ("Chuck Asay","http://www.gocomics.com/chuckasay"),
                            ("Tony Auth","http://www.gocomics.com/tonyauth"),
                            ("Donna Barstow","http://www.gocomics.com/donnabarstow"),
                            # ("Bruce Beattie","http://www.gocomics.com/brucebeattie"),
                            # ("Clay Bennett","http://www.gocomics.com/claybennett"),
                            # ("Lisa Benson","http://www.gocomics.com/lisabenson"),
                            # ("Steve Benson","http://www.gocomics.com/stevebenson"),
                            # ("Chip Bok","http://www.gocomics.com/chipbok"),
                            # ("Steve Breen","http://www.gocomics.com/stevebreen"),
                            # ("Chris Britt","http://www.gocomics.com/chrisbritt"),
                            # ("Stuart Carlson","http://www.gocomics.com/stuartcarlson"),
                            # ("Ken Catalino","http://www.gocomics.com/kencatalino"),
                            # ("Paul Conrad","http://www.gocomics.com/paulconrad"),
                            # ("Jeff Danziger","http://www.gocomics.com/jeffdanziger"),
                            # ("Matt Davies","http://www.gocomics.com/mattdavies"),
                            # ("John Deering","http://www.gocomics.com/johndeering"),
                            # ("Bob Gorrell","http://www.gocomics.com/bobgorrell"),
                            # ("Walt Handelsman","http://www.gocomics.com/walthandelsman"),
                            # ("Clay Jones","http://www.gocomics.com/clayjones"),
                            # ("Kevin Kallaugher","http://www.gocomics.com/kevinkallaugher"),
                            # ("Steve Kelley","http://www.gocomics.com/stevekelley"),
                            # ("Dick Locher","http://www.gocomics.com/dicklocher"),
                            # ("Chan Lowe","http://www.gocomics.com/chanlowe"),
                            # ("Mike Luckovich","http://www.gocomics.com/mikeluckovich"),
                            # ("Gary Markstein","http://www.gocomics.com/garymarkstein"),
                            # ("Glenn McCoy","http://www.gocomics.com/glennmccoy"),
                            # ("Jim Morin","http://www.gocomics.com/jimmorin"),
                            # ("Jack Ohman","http://www.gocomics.com/jackohman"),
                            # ("Pat Oliphant","http://www.gocomics.com/patoliphant"),
                            # ("Joel Pett","http://www.gocomics.com/joelpett"),
                            # ("Ted Rall","http://www.gocomics.com/tedrall"),
                            # ("Michael Ramirez","http://www.gocomics.com/michaelramirez"),
                            # ("Marshall Ramsey","http://www.gocomics.com/marshallramsey"),
                            # ("Steve Sack","http://www.gocomics.com/stevesack"),
                            # ("Ben Sargent","http://www.gocomics.com/bensargent"),
                            # ("Drew Sheneman","http://www.gocomics.com/drewsheneman"),
                            # ("John Sherffius","http://www.gocomics.com/johnsherffius"),
                            # ("Small World","http://www.gocomics.com/smallworld"),
                            # ("Scott Stantis","http://www.gocomics.com/scottstantis"),
                            # ("Wayne Stayskal","http://www.gocomics.com/waynestayskal"),
                            # ("Dana Summers","http://www.gocomics.com/danasummers"),
                            # ("Paul Szep","http://www.gocomics.com/paulszep"),
                            # ("Mike Thompson","http://www.gocomics.com/mikethompson"),
                            # ("Tom Toles","http://www.gocomics.com/tomtoles"),
                            # ("Gary Varvel","http://www.gocomics.com/garyvarvel"),
                            # ("ViewsAfrica","http://www.gocomics.com/viewsafrica"),
                            # ("ViewsAmerica","http://www.gocomics.com/viewsamerica"),
                            # ("ViewsAsia","http://www.gocomics.com/viewsasia"),
                            # ("ViewsBusiness","http://www.gocomics.com/viewsbusiness"),
                            # ("ViewsEurope","http://www.gocomics.com/viewseurope"),
                            # ("ViewsLatinAmerica","http://www.gocomics.com/viewslatinamerica"),
                            # ("ViewsMidEast","http://www.gocomics.com/viewsmideast"),
                            # ("Views of the World","http://www.gocomics.com/viewsoftheworld"),
                            # ("Kerry Waghorn","http://www.gocomics.com/facesinthenews"),
                            # ("Dan Wasserman","http://www.gocomics.com/danwasserman"),
                            # ("Signe Wilkinson","http://www.gocomics.com/signewilkinson"),
                            # ("Wit of the World","http://www.gocomics.com/witoftheworld"),
                            # ("Don Wright","http://www.gocomics.com/donwright"),
                             ]:
            articles = self.make_links(url)
            if articles:    
                feeds.append((title, articles))
        return feeds        
                            
    def make_links(self, url):
        title = 'Temp'      
        description = ''    
        date = ''
        current_articles = []
        pages = range(1, self.num_comics_to_get+1)
        for page in pages:
            page_soup = self.index_to_soup(url)
            if page_soup:
                try:
                  strip_title = page_soup.h1.a.string
                except:
                  strip_title = 'Error - no page_soup.h1.a.string'
                try:
                  date_title = page_soup.find('ul', attrs={'class': 'feature-nav'}).li.string
                except:
                  date_title = 'Error - no page_soup.h1.li.string'
                title = strip_title + ' - ' + date_title
                for i in range(2):
                  try:
                    strip_url_date = page_soup.h1.a['href']
                    break #success - this is normal exit
                  except:
                    continue #try to get strip_url_date again
                  continue # give up on this strip date
                for i in range(2):
                  try:
                    prev_strip_url_date = page_soup.find('a', attrs={'class': 'prev'})['href']
                    break #success - this is normal exit
                  except:
                    continue #try to get prev_strip_url_date again
                  continue # give up on this prev strip date
                if strip_url_date:
                  page_url = 'http://www.gocomics.com' + strip_url_date
                else:
                  continue
                if prev_strip_url_date:
                  prev_page_url = 'http://www.gocomics.com' + prev_strip_url_date
                else:
                  continue
            current_articles.append({'title': title, 'url': page_url, 'description':'', 'date':''})
            url = prev_page_url
        current_articles.reverse()
        return current_articles

    def preprocess_html(self, soup):
        if soup.title:
            title_string = soup.title.string.strip()
            _cd = title_string.split(',',1)[1]
            comic_date = ' '.join(_cd.split(' ', 4)[0:-1])
        if soup.h1.span:
            artist = soup.h1.span.string
            soup.h1.span.string.replaceWith(comic_date + artist)
        feature_item = soup.find('p',attrs={'class':'feature_item'})
        if feature_item.a:
            a_tag = feature_item.a
            a_href = a_tag["href"]
            img_tag = a_tag.img
            img_tag["src"] = a_href
            img_tag["width"] = self.comic_size
            img_tag["height"] = None
        return self.adeify_images(soup)
        
    extra_css = '''
                    h1{font-family:Arial,Helvetica,sans-serif; font-weight:bold;font-size:large;}
                    h2{font-family:Arial,Helvetica,sans-serif; font-weight:normal;font-size:small;}
                    img {max-width:100%; min-width:100%;}
                    p{font-family:Arial,Helvetica,sans-serif;font-size:small;}
                    body{font-family:Helvetica,Arial,sans-serif;font-size:small;}
		'''
Starson17 is offline  
Old 08-14-2010, 11:19 AM   #2441
cisaak
Member
cisaak began at the beginning.
 
Posts: 17
Karma: 10
Join Date: Aug 2010
Device: Kindle DX
[QUOTE=Starson17]You haven't given enough info, but you can try adding this to the keep_only_tags:
Code:
dict(name='h1'), dict(name='p', attrs={'class':'byline'})
(Order is important inside the keep_only)

Thanks. This works except that h1 displays twice since it appears twice in the html. Anyway to limit output to one h1?
cisaak is offline  
Old 08-14-2010, 12:48 PM   #2442
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by cisaak View Post
Thanks. This works except that h1 displays twice since it appears twice in the html. Anyway to limit output to one h1?
That's part of what I meant when I said you didn't give enough information. You often need to remove a few items from inside what was kept. Without looking at the site, I can't advise on the best way to get only the first one or remove the second.

Is there a class or id label inside the two h1 tags that differs between them?

Or, you could just give me a link to an article and I'll check it out. Alternatively, there are more powerful/complicated ways to keep only the first h1 tag.

Last edited by Starson17; 08-14-2010 at 01:03 PM.
Starson17 is offline  
Old 08-14-2010, 01:24 PM   #2443
soothsayer
Member
soothsayer began at the beginning.
 
Posts: 13
Karma: 34
Join Date: Jul 2010
Device: hanlin, astak the 2010 version plz.
making a custom recipe for the NY Daily News, basic recipe, but first time using python and I need some help formatting.

here is waht I have now.

-------------------------------------------------------------------


Code:
class AdvancedUserRecipe1281804307(BasicNewsRecipe):
    title          = u'NY Daily News'
    __author__ = 'you'

    description           = 'News from NY Daily News'
    language              = 'en'
    publisher             = 'NY Daily News'
    category              = 'news, politics, sports, ny'
    oldest_article        = 7
    max_articles_per_feed = 100
    no_stylesheets        = True
    cover_url             = 
    encoding              = 'utf-8'

    oldest_article = 7
    max_articles_per_feed = 100

    no_stylesheets = True
    keep_only_tags    = [
                       dict(name='div', attrs={'id':['art_story']})
                        ]
    remove_tags = [
                       dict(name='div', attrs={'class':['code_module']})
                  ]
    feeds = [(u'Top Stories', u'http://www.nydailynews.com/index_rss.xml'), 
             (u'News', u'http://www.nydailynews.com/news/index_rss.xml'),
             (u'NY Crime', u'http://www.nydailynews.com/news/ny_crime/index_rss.xml'), 
			 (u'NY Local', u'http://www.nydailynews.com/ny_local/index_rss.xml'),
			 (u'Politics', u'http://www.nydailynews.com/news/politics/index_rss.xml'),
			 (u'Music', u'http://www.nydailynews.com/entertainment/music/index_rss.xml'),
             (u'Arts', u'http://www.nydailynews.com/entertainment/arts/index_rss.xml'),
			 (u'Food and Dining', u'http://www.nydailynews.com/lifestyle/food/index_rss.xml'),
			 (u'Lifestyle', u'http://www.nydailynews.com/lifestyle/index_rss.xml'),
			 (u'Health/Well Being', u'http://www.nydailynews.com/lifestyle/health/index_rss.xml'),
			 (u'Sports', u'http://www.nydailynews.com/sports/index_rss.xml'),
             ]


-------------------------------------------

as you can see, "cover_url" is blank, i'm not sure how to format the variables because the url for it will change depending on the date, and it's my first time using python.

here is the basic format for the ny daily news cover page.
http://assets.nydailynews.com/img/20...tpage_0814.jpg

can somebody show me an example template on how to do this?


thanks.
i've another question, in the feeds section, what's that "u" for that just in front of the title and url? i.e., (u'NY Crime', u'http://www.nydailynews.com/news/ny_crime/index_rss.xml'),


btw, i couldn't find the custom recipe for the ny daily news here in this forum.

here are all teh feeds from the ny daily news
http://www.nydailynews.com/services/...ols/index.html

Last edited by soothsayer; 08-14-2010 at 02:35 PM. Reason: fixing indentation
soothsayer is offline  
Old 08-14-2010, 01:49 PM   #2444
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by soothsayer View Post
here is waht I have now.
When pasting a recipe here, you should use the CODE tags (hash mark) or you lose all indents. Indents are critical for Python code.

Quote:
as you can see, "cover_url" is blank, i'm not sure how to format the variables because the url for it will change depending on the date, and it's my first time using python.

here is the basic format for the ny daily news cover page.
http://assets.nydailynews.com/img/20...tpage_0814.jpg

can somebody show me an example template on how to do this?
Here is one method: ("index" is just the URL to a page that has the cover in an img tag in a span tag of class "cover" where the src of the img tag is the URL to the cover)
Code:
    def get_cover_url(self):
        cover_url = None
        soup = self.index_to_soup(self.index)
        cover_item = soup.find('span', attrs={'class':'cover'})
        if cover_item:
           cover_url = cover_item.img['src']
        return cover_url
thanks.
Quote:
i've another question, in the feeds section, what's that "u" for that just in front of the title and url? i.e., (u'NY Crime', u'http://www.nydailynews.com/news/ny_crime/index_rss.xml'),
It means the string is "unicode."
Starson17 is offline  
Old 08-14-2010, 03:21 PM   #2445
soothsayer
Member
soothsayer began at the beginning.
 
Posts: 13
Karma: 34
Join Date: Jul 2010
Device: hanlin, astak the 2010 version plz.
Here is one method: ("index" is just the URL to a page that has the cover in an img tag in a span tag of class "cover" where the src of the img tag is the URL to the cover)
Code:
    def get_cover_url(self):
        cover_url = None
        soup = self.index_to_soup(self.index)
        cover_item = soup.find('span', attrs={'class':'cover'})
        if cover_item:
           cover_url = cover_item.img['src']
        return cover_url
is there anyway to do this more simply? i was reading into datetime, but couldn't figure out how to use it aside from adding the line "import datetime" into the top of the script.

an example cover page url is the following:
http://assets.nydailynews.com/img/20...tpage_0814.jpg

it contains the year "2010", month "08", and day "14".

the image url was obtained from the daily news cover page archive
herehttp://www.nydailynews.com/news/galleries/august_2010_daily_news_front_pages/august_2010_daily_news_front_pages.html

Last edited by soothsayer; 08-14-2010 at 03:26 PM.
soothsayer is offline  
Closed Thread


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Custom column read ? pchrist7 Calibre 2 10-04-2010 02:52 AM
Archive for custom screensavers sleeplessdave Amazon Kindle 1 07-07-2010 12:33 PM
How to back up preferences and custom recipes? greenapple Calibre 3 03-29-2010 05:08 AM
Donations for Custom Recipes ddavtian Calibre 5 01-23-2010 04:54 PM
Help understanding custom recipes andersent Calibre 0 12-17-2009 02:37 PM


All times are GMT -4. The time now is 10:15 AM.


MobileRead.com is a privately owned, operated and funded community.