Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Closed Thread
 
Thread Tools Search this Thread
Old 03-13-2010, 04:02 PM   #1591
DoctorOhh
US Navy, Retired
DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.
 
DoctorOhh's Avatar
 
Posts: 9,897
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Kindle PaperWhite SE 11th Gen
Quote:
Originally Posted by ziegl027
I downloaded the latest version of Calibre that was supposed to have the comic-cut off correction in it and tried again. The FIRST comic I had was fine. But all the rest were still clipped. I don't know if the recipe needs a tweak, or if I need to ask Kovid to fix a bug, or what!
While your waiting for a solution you can try switching your page setup (output and input profiles) to PRS-300 (under preferences - conversion). The only thing this should affect is image sizing. The images may be a little smaller then the Sony Touch profile.

Update: It looks like I'm wrong they are both 600x800. But one profile might be better tweaked then the other.

Last edited by DoctorOhh; 03-13-2010 at 04:29 PM.
DoctorOhh is offline  
Old 03-13-2010, 04:26 PM   #1592
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by dwanthny View Post
While your waiting for a solution you can try switching your page setup (output and input profiles) to PRS-300 (under preferences - conversion). The only thing this should affect is image sizing. The images will be a little smaller then the Sony Touch profile.
Thanks for helping her dwanthny. That post was actually sent to me by PM, and this shows why I prefer to respond in the thread. Not only does it make it possible for others to help, it makes it part of the permanent record so others can later search to find the answer.
Starson17 is offline  
Old 03-13-2010, 05:11 PM   #1593
kiklop74
Guru
kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.
 
kiklop74's Avatar
 
Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
nrc.nl recipe:
Attached Files
File Type: zip nrc.nl.zip (3.0 KB, 227 views)
kiklop74 is offline  
Old 03-14-2010, 03:06 AM   #1594
Simon_W
Junior Member
Simon_W began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Mar 2010
Device: Bebook
Quote:
Originally Posted by kiklop74 View Post
nrc.nl recipe:
Thanks Kiklop74, this works great and NRC is the best newspaper in The Netherlands so this is very useful!
Simon_W is offline  
Old 03-14-2010, 10:49 AM   #1595
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by Ekips View Post


I'm still trying to stumble my way through a custom script to fetch the news from the sun website, I've sorted it so it changes the web page into the print page and leaves out the slideshows but it fetches all sorts of rubbish after the story including all the 'connect to us' stuff

How would I just get the headline and the main body of the article?

Here's the source code from a basic printpage from the sun.co.uk

What tags do I need to keep and what ones should I drop?

Thanks.
Give us the recipe you're using that's causing trouble. I see you have one above, but you said there it wasn't retrieving, but in this post, it's retrieving, but leaving junk. The code from the site is hard to work with. Just post your best recipe with the problem you're having and I'll take a look.
Starson17 is offline  
Old 03-14-2010, 10:56 AM   #1596
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by kiklop74 View Post
nrc.nl recipe:
Kiklop:
What does this do?
Code:
    def preprocess_html(self, soup):
        return self.adeify_images(soup)
Is it related to what Kovid said about image size and ADE:
Quote:
Adobe Digital Editions stupidly does not rescale images that don't fit on the screen.
Thanks.
Starson17 is offline  
Old 03-14-2010, 11:38 AM   #1597
lorenzov
Member
lorenzov began at the beginning.
 
lorenzov's Avatar
 
Posts: 23
Karma: 12
Join Date: Jan 2010
Location: Edinburgh, UK
Device: SONY PRS600, Apple iPhone 3G
il Corriere della sera int versions fixed

someone on their side changed the permalinks in the feed and these need to be rebuilt!

the problem affected the english and chinese version, but not the italian one


(icons added)
Attached Files
File Type: zip corriereInt_v2.zip (4.9 KB, 217 views)

Last edited by lorenzov; 03-14-2010 at 12:05 PM.
lorenzov is offline  
Old 03-14-2010, 12:29 PM   #1598
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by Starson17 View Post
Here is the gocomics.com recipe I promised.
A note about my comics.com and gocomics.com recipes. I've noticed that reading multiple days is sort of painful as the most recent strip is listed first in the index, and (oddly enough) readers tend to show first things first. So you read the current strip before reading what happened in previous strips that often lead up to what's happening in the current strip.

If you read a strip daily, the current organization is probably best, but if you are exploring new strips and want to start at an earlier point, it's painful. You have to go to a later page, then slowly work back to front to get to each 'next' page.

A simple fix for this is to reverse the article order by adding this line to the recipe:

Code:
        current_articles.reverse()
Put it right before this line (same level of indent for both):
Code:
        return current_articles
Here is the gocomics code with this fix:
Code:
#!/usr/bin/env  python

__license__   = 'GPL v3'
__copyright__ = 'Copyright 2010 Starson17'
'''
www.gocomics.com
'''
from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import BeautifulSoup, Tag, NavigableString
import urllib, re, mechanize

class GoComics(BasicNewsRecipe):
    title               = 'GoComics Reversed'
    __author__          = 'Starson17' 
    __version__         = '1.01'
    __date__            = '13 March 2010'
    description         = '200+ Comics - Customize for more days/comics: Defaults to 7 days, 15 comics - 10 general, 5 editorial.'
    language            = 'en'
    use_embedded_content= False
    no_stylesheets      = True
    remove_javascript   = True
    cover_url           = 'http://paulbuckley14059.files.wordpress.com/2008/06/calvin-and-hobbes.jpg'

    ####### USER PREFERENCES - COMICS, IMAGE SIZE AND NUMBER OF COMICS TO RETRIEVE ########
    # num_comics_to_get - I've tried up to 99 on Calvin&Hobbes
    num_comics_to_get = 7
    # comic_size 300 is small, 600 is medium, 900 is large, 1500 is extra-large
    comic_size = 1200
    # CHOOSE COMIC STRIPS BELOW - REMOVE COMMENT '# ' FROM IN FRONT OF DESIRED STRIPS 
    # Please do not overload their servers by selecting all comics and 1000 strips from each!
    
    keep_only_tags     = [dict(name='div', attrs={'class':['feature','banner']}),
                          ]

    remove_tags = [dict(name='a', attrs={'class':['beginning','prev','cal','next','newest']}),
                   dict(name='div', attrs={'class':['tag-wrapper']}),
                   dict(name='ul', attrs={'class':['share-nav','feature-nav']}),
                   ]
    
    def get_browser(self):
        br = BasicNewsRecipe.get_browser(self)
        orig_open_novisit = br.open_novisit
        def my_open_no_visit(url, **kwargs):
            req = mechanize.Request(
                    url,
                    headers = {
                        'Referer':'http://www.gocomics.com/',
                        })
            return orig_open_novisit(req)
        br.open_novisit = my_open_no_visit
        return br
         
    def parse_index(self):
        feeds = []
        for title, url in [
                            ######## COMICS - GENERAL ########
                            # (u"2 Cows and a Chicken", u"http://www.gocomics.com/2cowsandachicken"),
                            # (u"9 to 5", u"http://www.gocomics.com/9to5"),
                            # (u"The Academia Waltz", u"http://www.gocomics.com/academiawaltz"),
                            # (u"Adam@Home", u"http://www.gocomics.com/adamathome"),
                            # (u"Agnes", u"http://www.gocomics.com/agnes"),
                            # (u"Andy Capp", u"http://www.gocomics.com/andycapp"),
                            # (u"Animal Crackers", u"http://www.gocomics.com/animalcrackers"),
                            # (u"Annie", u"http://www.gocomics.com/annie"),
                            # (u"The Argyle Sweater", u"http://www.gocomics.com/theargylesweater"),
                            # (u"Ask Shagg", u"http://www.gocomics.com/askshagg"),
                            (u"B.C.", u"http://www.gocomics.com/bc"),
                            # (u"Back in the Day", u"http://www.gocomics.com/backintheday"),
                            # (u"Bad Reporter", u"http://www.gocomics.com/badreporter"),
                            # (u"Baldo", u"http://www.gocomics.com/baldo"),
                            # (u"Ballard Street", u"http://www.gocomics.com/ballardstreet"),
                            # (u"Barkeater Lake", u"http://www.gocomics.com/barkeaterlake"),
                            # (u"The Barn", u"http://www.gocomics.com/thebarn"),
                            # (u"Basic Instructions", u"http://www.gocomics.com/basicinstructions"),
                            # (u"Bewley", u"http://www.gocomics.com/bewley"),
                            # (u"Big Top", u"http://www.gocomics.com/bigtop"),
                            # (u"Biographic", u"http://www.gocomics.com/biographic"),
                            # (u"Birdbrains", u"http://www.gocomics.com/birdbrains"),
                            # (u"Bleeker: The Rechargeable Dog", u"http://www.gocomics.com/bleeker"),
                            # (u"Bliss", u"http://www.gocomics.com/bliss"),
                            (u"Bloom County", u"http://www.gocomics.com/bloomcounty"),
                            # (u"Bo Nanas", u"http://www.gocomics.com/bonanas"),
                            # (u"Bob the Squirrel", u"http://www.gocomics.com/bobthesquirrel"),
                            # (u"The Boiling Point", u"http://www.gocomics.com/theboilingpoint"),
                            # (u"Boomerangs", u"http://www.gocomics.com/boomerangs"),
                            # (u"The Boondocks", u"http://www.gocomics.com/boondocks"),
                            # (u"Bottomliners", u"http://www.gocomics.com/bottomliners"),
                            # (u"Bound and Gagged", u"http://www.gocomics.com/boundandgagged"),
                            # (u"Brainwaves", u"http://www.gocomics.com/brainwaves"),
                            # (u"Brenda Starr", u"http://www.gocomics.com/brendastarr"),
                            # (u"Brewster Rockit", u"http://www.gocomics.com/brewsterrockit"),
                            # (u"Broom Hilda", u"http://www.gocomics.com/broomhilda"),
                            (u"Calvin and Hobbes", u"http://www.gocomics.com/calvinandhobbes"),
                            # (u"Candorville", u"http://www.gocomics.com/candorville"),
                            # (u"Cathy", u"http://www.gocomics.com/cathy"),
                            # (u"C'est la Vie", u"http://www.gocomics.com/cestlavie"),
                            # (u"Chuckle Bros", u"http://www.gocomics.com/chucklebros"),
                            # (u"Citizen Dog", u"http://www.gocomics.com/citizendog"),
                            # (u"The City", u"http://www.gocomics.com/thecity"),
                            # (u"Cleats", u"http://www.gocomics.com/cleats"),
                            # (u"Close to Home", u"http://www.gocomics.com/closetohome"),
                            # (u"Compu-toon", u"http://www.gocomics.com/compu-toon"),
                            # (u"Cornered", u"http://www.gocomics.com/cornered"),
                            # (u"Cul de Sac", u"http://www.gocomics.com/culdesac"),
                            # (u"Daddy's Home", u"http://www.gocomics.com/daddyshome"),
                            # (u"Deep Cover", u"http://www.gocomics.com/deepcover"),
                            # (u"Dick Tracy", u"http://www.gocomics.com/dicktracy"),
                            # (u"The Dinette Set", u"http://www.gocomics.com/dinetteset"),
                            # (u"Dog Eat Doug", u"http://www.gocomics.com/dogeatdoug"),
                            # (u"Domestic Abuse", u"http://www.gocomics.com/domesticabuse"),
                            # (u"Doodles", u"http://www.gocomics.com/doodles"),
                            # (u"Doonesbury", u"http://www.gocomics.com/doonesbury"),
                            # (u"The Doozies", u"http://www.gocomics.com/thedoozies"),
                            # (u"The Duplex", u"http://www.gocomics.com/duplex"),
                            # (u"Eek!", u"http://www.gocomics.com/eek"),
                            # (u"The Elderberries", u"http://www.gocomics.com/theelderberries"),
                            # (u"Flight Deck", u"http://www.gocomics.com/flightdeck"),
                            # (u"Flo and Friends", u"http://www.gocomics.com/floandfriends"),
                            # (u"The Flying McCoys", u"http://www.gocomics.com/theflyingmccoys"),
                            (u"For Better or For Worse", u"http://www.gocomics.com/forbetterorforworse"),
                            # (u"For Heaven's Sake", u"http://www.gocomics.com/forheavenssake"),
                            # (u"Fort Knox", u"http://www.gocomics.com/fortknox"),
                            # (u"FoxTrot", u"http://www.gocomics.com/foxtrot"),
                            (u"FoxTrot Classics", u"http://www.gocomics.com/foxtrotclassics"),
                            # (u"Frank & Ernest", u"http://www.gocomics.com/frankandernest"),
                            # (u"Fred Basset", u"http://www.gocomics.com/fredbasset"),
                            # (u"Free Range", u"http://www.gocomics.com/freerange"),
                            # (u"Frog Applause", u"http://www.gocomics.com/frogapplause"),
                            # (u"The Fusco Brothers", u"http://www.gocomics.com/thefuscobrothers"),
                            (u"Garfield", u"http://www.gocomics.com/garfield"),
                            # (u"Garfield Minus Garfield", u"http://www.gocomics.com/garfieldminusgarfield"),
                            # (u"Gasoline Alley", u"http://www.gocomics.com/gasolinealley"),
                            # (u"Gil Thorp", u"http://www.gocomics.com/gilthorp"),
                            # (u"Ginger Meggs", u"http://www.gocomics.com/gingermeggs"),
                            # (u"Girls & Sports", u"http://www.gocomics.com/girlsandsports"),
                            # (u"Haiku Ewe", u"http://www.gocomics.com/haikuewe"),
                            # (u"Heart of the City", u"http://www.gocomics.com/heartofthecity"),
                            # (u"Heathcliff", u"http://www.gocomics.com/heathcliff"),
                            # (u"Herb and Jamaal", u"http://www.gocomics.com/herbandjamaal"),
                            # (u"Home and Away", u"http://www.gocomics.com/homeandaway"),
                            # (u"Housebroken", u"http://www.gocomics.com/housebroken"),
                            # (u"Hubert and Abby", u"http://www.gocomics.com/hubertandabby"),
                            # (u"Imagine This", u"http://www.gocomics.com/imaginethis"),
                            # (u"In the Bleachers", u"http://www.gocomics.com/inthebleachers"),
                            # (u"In the Sticks", u"http://www.gocomics.com/inthesticks"),
                            # (u"Ink Pen", u"http://www.gocomics.com/inkpen"),
                            # (u"It's All About You", u"http://www.gocomics.com/itsallaboutyou"),
                            # (u"Joe Vanilla", u"http://www.gocomics.com/joevanilla"),
                            # (u"La Cucaracha", u"http://www.gocomics.com/lacucaracha"),
                            # (u"Last Kiss", u"http://www.gocomics.com/lastkiss"),
                            # (u"Legend of Bill", u"http://www.gocomics.com/legendofbill"),
                            # (u"Liberty Meadows", u"http://www.gocomics.com/libertymeadows"),
                            # (u"Lio", u"http://www.gocomics.com/lio"),
                            # (u"Little Dog Lost", u"http://www.gocomics.com/littledoglost"),
                            # (u"Little Otto", u"http://www.gocomics.com/littleotto"),
                            # (u"Loose Parts", u"http://www.gocomics.com/looseparts"),
                            # (u"Love Is...", u"http://www.gocomics.com/loveis"),
                            # (u"Maintaining", u"http://www.gocomics.com/maintaining"),
                            # (u"The Meaning of Lila", u"http://www.gocomics.com/meaningoflila"),
                            # (u"Middle-Aged White Guy", u"http://www.gocomics.com/middleagedwhiteguy"),
                            # (u"The Middletons", u"http://www.gocomics.com/themiddletons"),
                            # (u"Momma", u"http://www.gocomics.com/momma"),
                            # (u"Mutt & Jeff", u"http://www.gocomics.com/muttandjeff"),
                            # (u"Mythtickle", u"http://www.gocomics.com/mythtickle"),
                            # (u"Nest Heads", u"http://www.gocomics.com/nestheads"),
                            # (u"NEUROTICA", u"http://www.gocomics.com/neurotica"),
                            # (u"New Adventures of Queen Victoria", u"http://www.gocomics.com/thenewadventuresofqueenvictoria"),
                            (u"Non Sequitur", u"http://www.gocomics.com/nonsequitur"),
                            # (u"The Norm", u"http://www.gocomics.com/thenorm"),
                            # (u"On A Claire Day", u"http://www.gocomics.com/onaclaireday"),
                            # (u"One Big Happy", u"http://www.gocomics.com/onebighappy"),
                            # (u"The Other Coast", u"http://www.gocomics.com/theothercoast"),
                            # (u"Out of the Gene Pool Re-Runs", u"http://www.gocomics.com/outofthegenepool"),
                            # (u"Overboard", u"http://www.gocomics.com/overboard"),
                            # (u"Pibgorn", u"http://www.gocomics.com/pibgorn"),
                            # (u"Pibgorn Sketches", u"http://www.gocomics.com/pibgornsketches"),
                            (u"Pickles", u"http://www.gocomics.com/pickles"),
                            # (u"Pinkerton", u"http://www.gocomics.com/pinkerton"),
                            # (u"Pluggers", u"http://www.gocomics.com/pluggers"),
                            # (u"Pooch Cafe", u"http://www.gocomics.com/poochcafe"),
                            # (u"PreTeena", u"http://www.gocomics.com/preteena"),
                            # (u"The Quigmans", u"http://www.gocomics.com/thequigmans"),
                            # (u"Rabbits Against Magic", u"http://www.gocomics.com/rabbitsagainstmagic"),
                            # (u"Real Life Adventures", u"http://www.gocomics.com/reallifeadventures"),
                            # (u"Red and Rover", u"http://www.gocomics.com/redandrover"),
                            # (u"Red Meat", u"http://www.gocomics.com/redmeat"),
                            # (u"Reynolds Unwrapped", u"http://www.gocomics.com/reynoldsunwrapped"),
                            # (u"Ronaldinho Gaucho", u"http://www.gocomics.com/ronaldinhogaucho"),
                            # (u"Rubes", u"http://www.gocomics.com/rubes"),
                            # (u"Scary Gary", u"http://www.gocomics.com/scarygary"),
                            (u"Shoe", u"http://www.gocomics.com/shoe"),
                            # (u"Shoecabbage", u"http://www.gocomics.com/shoecabbage"),
                            # (u"Skin Horse", u"http://www.gocomics.com/skinhorse"),
                            # (u"Slowpoke", u"http://www.gocomics.com/slowpoke"),
                            # (u"Speed Bump", u"http://www.gocomics.com/speedbump"),
                            # (u"State of the Union", u"http://www.gocomics.com/stateoftheunion"),
                            # (u"Stone Soup", u"http://www.gocomics.com/stonesoup"),
                            # (u"Strange Brew", u"http://www.gocomics.com/strangebrew"),
                            # (u"Sylvia", u"http://www.gocomics.com/sylvia"),
                            # (u"Tank McNamara", u"http://www.gocomics.com/tankmcnamara"),
                            # (u"Tiny Sepuku", u"http://www.gocomics.com/tinysepuku"),
                            # (u"TOBY", u"http://www.gocomics.com/toby"),
                            # (u"Tom the Dancing Bug", u"http://www.gocomics.com/tomthedancingbug"),
                            # (u"Too Much Coffee Man", u"http://www.gocomics.com/toomuchcoffeeman"),
                            # (u"W.T. Duck", u"http://www.gocomics.com/wtduck"),
                            # (u"Watch Your Head", u"http://www.gocomics.com/watchyourhead"),
                            # (u"Wee Pals", u"http://www.gocomics.com/weepals"),
                            # (u"Winnie the Pooh", u"http://www.gocomics.com/winniethepooh"),
                            (u"Wizard of Id", u"http://www.gocomics.com/wizardofid"),
                            # (u"Working It Out", u"http://www.gocomics.com/workingitout"),
                            # (u"Yenny", u"http://www.gocomics.com/yenny"),
                            # (u"Zack Hill", u"http://www.gocomics.com/zackhill"),
                            # (u"Ziggy", u"http://www.gocomics.com/ziggy"),
                            ######## COMICS - EDITORIAL ########
                            # ("Lalo Alcaraz","http://www.gocomics.com/laloalcaraz"),
                            # ("Nick Anderson","http://www.gocomics.com/nickanderson"),
                            # ("Chuck Asay","http://www.gocomics.com/chuckasay"),
                            # ("Tony Auth","http://www.gocomics.com/tonyauth"),
                            # ("Donna Barstow","http://www.gocomics.com/donnabarstow"),
                            # ("Bruce Beattie","http://www.gocomics.com/brucebeattie"),
                            # ("Clay Bennett","http://www.gocomics.com/claybennett"),
                            # ("Lisa Benson","http://www.gocomics.com/lisabenson"),
                            # ("Steve Benson","http://www.gocomics.com/stevebenson"),
                            # ("Chip Bok","http://www.gocomics.com/chipbok"),
                            # ("Steve Breen","http://www.gocomics.com/stevebreen"),
                            # ("Chris Britt","http://www.gocomics.com/chrisbritt"),
                            # ("Stuart Carlson","http://www.gocomics.com/stuartcarlson"),
                            # ("Ken Catalino","http://www.gocomics.com/kencatalino"),
                            # ("Paul Conrad","http://www.gocomics.com/paulconrad"),
                            # ("Jeff Danziger","http://www.gocomics.com/jeffdanziger"),
                            # ("Matt Davies","http://www.gocomics.com/mattdavies"),
                            # ("John Deering","http://www.gocomics.com/johndeering"),
                            # ("Bob Gorrell","http://www.gocomics.com/bobgorrell"),
                            # ("Walt Handelsman","http://www.gocomics.com/walthandelsman"),
                            # ("Clay Jones","http://www.gocomics.com/clayjones"),
                            # ("Kevin Kallaugher","http://www.gocomics.com/kevinkallaugher"),
                            # ("Steve Kelley","http://www.gocomics.com/stevekelley"),
                            # ("Dick Locher","http://www.gocomics.com/dicklocher"),
                            # ("Chan Lowe","http://www.gocomics.com/chanlowe"),
                            ("Mike Luckovich","http://www.gocomics.com/mikeluckovich"),
                            # ("Gary Markstein","http://www.gocomics.com/garymarkstein"),
                            # ("Glenn McCoy","http://www.gocomics.com/glennmccoy"),
                            # ("Jim Morin","http://www.gocomics.com/jimmorin"),
                            # ("Jack Ohman","http://www.gocomics.com/jackohman"),
                            ("Pat Oliphant","http://www.gocomics.com/patoliphant"),
                            # ("Joel Pett","http://www.gocomics.com/joelpett"),
                            ("Ted Rall","http://www.gocomics.com/tedrall"),
                            # ("Michael Ramirez","http://www.gocomics.com/michaelramirez"),
                            # ("Marshall Ramsey","http://www.gocomics.com/marshallramsey"),
                            # ("Steve Sack","http://www.gocomics.com/stevesack"),
                            # ("Ben Sargent","http://www.gocomics.com/bensargent"),
                            # ("Drew Sheneman","http://www.gocomics.com/drewsheneman"),
                            # ("John Sherffius","http://www.gocomics.com/johnsherffius"),
                            ("Small World","http://www.gocomics.com/smallworld"),
                            # ("Scott Stantis","http://www.gocomics.com/scottstantis"),
                            # ("Wayne Stayskal","http://www.gocomics.com/waynestayskal"),
                            # ("Dana Summers","http://www.gocomics.com/danasummers"),
                            # ("Paul Szep","http://www.gocomics.com/paulszep"),
                            # ("Mike Thompson","http://www.gocomics.com/mikethompson"),
                            ("Tom Toles","http://www.gocomics.com/tomtoles"),
                            # ("Gary Varvel","http://www.gocomics.com/garyvarvel"),
                            # ("ViewsAfrica","http://www.gocomics.com/viewsafrica"),
                            # ("ViewsAmerica","http://www.gocomics.com/viewsamerica"),
                            # ("ViewsAsia","http://www.gocomics.com/viewsasia"),
                            # ("ViewsBusiness","http://www.gocomics.com/viewsbusiness"),
                            ("ViewsEurope","http://www.gocomics.com/viewseurope"),
                            # ("ViewsLatinAmerica","http://www.gocomics.com/viewslatinamerica"),
                            # ("ViewsMidEast","http://www.gocomics.com/viewsmideast"),
                            # ("Views of the World","http://www.gocomics.com/viewsoftheworld"),
                            # ("Kerry Waghorn","http://www.gocomics.com/facesinthenews"),
                            # ("Dan Wasserman","http://www.gocomics.com/danwasserman"),
                            # ("Signe Wilkinson","http://www.gocomics.com/signewilkinson"),
                            # ("Wit of the World","http://www.gocomics.com/witoftheworld"),
                            # ("Don Wright","http://www.gocomics.com/donwright"),
                             ]:
            articles = self.make_links(url)
            if articles:
                feeds.append((title, articles))
        return feeds
        
    def make_links(self, url):
        title = 'Temp'
        description = ''
        date = ''
        current_articles = []
        pages = range(1, self.num_comics_to_get+1)
        for page in pages:
            page_soup = self.index_to_soup(url)
            if page_soup:
                strip_title = page_soup.h1.a.string
                date_title = page_soup.find('ul', attrs={'class': 'feature-nav'}).li.string
                title = strip_title + ' - ' + date_title
                strip_url_date = page_soup.h1.a['href']
                prev_strip_url_date = page_soup.find('a', attrs={'class': 'prev'})['href']
                page_url = 'http://www.gocomics.com' + strip_url_date
                prev_page_url = 'http://www.gocomics.com' + prev_strip_url_date
            current_articles.append({'title': title, 'url': page_url, 'description':'', 'date':''})
            url = prev_page_url
        current_articles.reverse()
        return current_articles

    def preprocess_html(self, soup):
        if soup.title:
            title_string = soup.title.string.strip()
            _cd = title_string.split(',',1)[1]
            comic_date = ' '.join(_cd.split(' ', 4)[0:-1])
        if soup.h1.span:
            artist = soup.h1.span.string
            soup.h1.span.string.replaceWith(comic_date + artist)
        feature_item = soup.find('p',attrs={'class':'feature_item'})
        if feature_item.a:
            a_tag = feature_item.a
            a_href = a_tag["href"]
            img_tag = a_tag.img
            img_tag["src"] = a_href
            img_tag["width"] = self.comic_size
            img_tag["height"] = None
        return soup
        
    extra_css = '''
                    h1{font-family:Arial,Helvetica,sans-serif; font-weight:bold;font-size:large;}
                    h2{font-family:Arial,Helvetica,sans-serif; font-weight:normal;font-size:small;}
                    p{font-family:Arial,Helvetica,sans-serif;font-size:small;}
                    body{font-family:Helvetica,Arial,sans-serif;font-size:small;}
		'''

Last edited by Starson17; 03-14-2010 at 12:32 PM.
Starson17 is offline  
Old 03-14-2010, 12:49 PM   #1599
Ekips
Member
Ekips began at the beginning.
 
Posts: 15
Karma: 10
Join Date: Mar 2010
Device: PW2, K3gb(x2), K3w, K4, k5(x3) PRS-505s, Stanza for ipod
Quote:
Originally Posted by Starson17 View Post
Give us the recipe you're using that's causing trouble. I see you have one above, but you said there it wasn't retrieving, but in this post, it's retrieving, but leaving junk. The code from the site is hard to work with. Just post your best recipe with the problem you're having and I'll take a look.
Thanks, that would be great.

This is what I've got so far, I've managed to fetch the main story, leave out the 'share this article on twitter - facebook' etc links, leave out the slideshows.

I'm still trying to clean it up and it brings back a few blank pages and comes up with a few 'You need flashplayer 8 or higher' bits I cant get rid of.

Code:
class AdvancedUserRecipe1268409464(BasicNewsRecipe):
    title          = u'The Sun'
    __author__            = 'Chaz Ralph'
    description           = 'News from United Kingdom' 
    oldest_article = 3
    max_articles_per_feed = 100
    no_stylesheets = True
    extra_css      = '.headline {font-size: x-large;} \n .fact { padding-top: 10pt  }'

    keep_only_tags    = [ 
                           dict(name='div', attrs={'class':'medium-centered'})
                          ,dict(name='div', attrs={'class':'article'})
                          ,dict(name='div', attrs={'class':'clear-left'}) 
                        ]

    remove_tags    = [dict(name='div', attrs={'class':'slideshow'})
                              ,dict(name='div', attrs={'class':'float-left'})
                              ,dict(name='div', attrs={'class':'ltbx-slideshow ltbx-btn-ss'})]


    feeds          = [(u'News', u'http://www.thesun.co.uk/sol/homepage/feeds/rss/article312900.ece')
                        ,(u'Sport', u'http://www.thesun.co.uk/sol/homepage/feeds/rss/article247732.ece')
                        ,(u'Football', u'http://www.thesun.co.uk/sol/homepage/feeds/rss/article247739.ece')
                        ,(u'Gizmo', u'http://www.thesun.co.uk/sol/homepage/feeds/rss/article247829.ece')
                        ,(u'Bizarre', u'http://www.thesun.co.uk/sol/homepage/feeds/rss/article247767.ece')]

    def print_version(self, url):
          return url.replace('?OTC-RSS&ATTR=News', '?print=yes')
    def print_version(self, url):
          return url.replace('?OTC-RSS&ATTR=Royals', '?print=yes')
    def print_version(self, url):
          return url.replace('?OTC-RSS&ATTR=Gizmo', '?print=yes')
    def print_version(self, url):
          return url.replace('?OTC-RSS&ATTR=Boxing', '?print=yes')
    def print_version(self, url):
          return url.replace('?OTC-RSS&ATTR=Cricket', '?print=yes')
    def print_version(self, url):
          return url.replace('?OTC-RSS&ATTR=Football', '?print=yes')
    def print_version(self, url):
          return url.replace('?OTC-RSS&ATTR=Rugby+Union', '?print=yes')
    def print_version(self, url):
          return url.replace('?OTC-RSS&ATTR=Tv', '?print=yes')
    def print_version(self, url):
          return url.replace('?OTC-RSS&ATTR=Bizarre', '?print=yes')
    def print_version(self, url):
          return url.replace('?OTC-RSS&ATTR=Usa', '?print=yes')
    def print_version(self, url):
          return url.replace('?OTC-RSS&ATTR=Film', '?print=yes')
    def print_version(self, url):
          return url.replace('?OTC-RSS&ATTR=HomePage', '?print=yes')
This is my first ever attempt at python so excuse the roughness.
Ekips is offline  
Old 03-14-2010, 02:22 PM   #1600
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by Ekips View Post
This is my first ever attempt at python so excuse the roughness.
I'm a beginner, too. Kovid's been riding herd on my efforts, but I'll see if I can help you.

Your recipe looks pretty good. Minor cleanup: You might want to change the def print_version to this:
Code:
    def print_version(self, url):
          url.replace('?OTC-RSS&ATTR=News', '?print=yes')
          url.replace('?OTC-RSS&ATTR=Royals', '?print=yes')
          url.replace('?OTC-RSS&ATTR=Gizmo', '?print=yes')
          url.replace('?OTC-RSS&ATTR=Boxing', '?print=yes')
          url.replace('?OTC-RSS&ATTR=Cricket', '?print=yes')
          url.replace('?OTC-RSS&ATTR=Football', '?print=yes')
          url.replace('?OTC-RSS&ATTR=Rugby+Union', '?print=yes')
          url.replace('?OTC-RSS&ATTR=Tv', '?print=yes')
          url.replace('?OTC-RSS&ATTR=Bizarre', '?print=yes')
          url.replace('?OTC-RSS&ATTR=Usa', '?print=yes')
          url.replace('?OTC-RSS&ATTR=Film', '?print=yes')
          url.replace('?OTC-RSS&ATTR=HomePage', '?print=yes')
          return url
Each replace() just modifies url, so you can do them sequentially in the body, and return url instead of doing a single modification of url in the return line.


I ran the recipe in test mode, so I only pulled two feeds with two articles each. I didn't see any references to Flash. I did see some text "Advertisement" and some "Add a Comment" links that were left. Can you tell me exactly what feed/article you want help on?

Add this to your remove_tags to kill the "Add a Comment" :
Code:
,dict(name='a', attrs={'class':'add_a_comment'})
Do you know the best way to find these?

Use Firefox,
install the Firebug add-on,
open the page you're having trouble with,
find the item you want to remove on the original page (CTRL-F),
right click that item and select "Inspect Element"

It tells you the name, and id or class label of the element.
Then just put that into your remove_tag list.

The "Add a Comment" junk was in an <a> tag with id='addComment' and class= 'add_a_comment'. You could pull it with reference to either the id or the class.

Also, you can condense your 3 removes into one. Here is the line:
Code:
dict(name='div', attrs={'class':['slideshow','float-left','ltbx-slideshow ltbx-btn-ss']})
The 3 keeps can be condensed the same way.

Last comment - I usually add "remove_javascript = True" unless there's some reason not to use it.

Last edited by Starson17; 03-14-2010 at 02:24 PM.
Starson17 is offline  
Old 03-14-2010, 05:43 PM   #1602
Dereks
Connoisseur
Dereks began at the beginning.
 
Posts: 57
Karma: 10
Join Date: Feb 2010
Device: Kindle Paperwhite 1
Hi, can someone help me with this feed:
http://feed43.com/6441846012758810.xml
It's my own-made feed for this page of a news site: http://news.finance.ua/ru/~/2/ (it's in Russian)
Problem is, default recipe only fetches articles, which are currently exactly on this page and ignores the older ones.
Thanks for the help.
Dereks is offline  
Old 03-14-2010, 06:01 PM   #1603
kiklop74
Guru
kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.
 
kiklop74's Avatar
 
Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
Quote:
Originally Posted by Starson17 View Post
Kiklop:
What does this do?
Code:
    def preprocess_html(self, soup):
        return self.adeify_images(soup)
Is it related to what Kovid said about image size and ADE:

Thanks.
No, the problem is of another kind. You can see the details in issue 2256. This method is a must for epub output wherever you may have images inside <span> or <a> tag.

This is also documented in ADEQuirks wiki page.
kiklop74 is offline  
Old 03-14-2010, 06:47 PM   #1604
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by kiklop74 View Post
No, the problem is of another kind. You can see the details in issue 2256. This method is a must for epub output wherever you may have images inside <span> or <a> tag.

This is also documented in ADEQuirks wiki page.
Thanks. It looks like it's a Sony/ADE issue, and adeify_images() just cleans up the soup so they don't choke. Are there any negative side effects? Basically, I'm just asking if you add it by default to all recipes?
Starson17 is offline  
Old 03-14-2010, 06:55 PM   #1605
kiklop74
Guru
kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.
 
kiklop74's Avatar
 
Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
Quote:
Originally Posted by Starson17 View Post
Thanks. It looks like it's a Sony/ADE issue, and adeify_images() just cleans up the soup so they don't choke. Are there any negative side effects? Basically, I'm just asking if you add it by default to all recipes?
It is an ADE issue. There are no side effects but I use that method only when it is needed. No need to add yet another piece of code "just in case". Less code means simpler and faster recipe.
kiklop74 is offline  
Closed Thread


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Custom column read ? pchrist7 Calibre 2 10-04-2010 02:52 AM
Archive for custom screensavers sleeplessdave Amazon Kindle 1 07-07-2010 12:33 PM
How to back up preferences and custom recipes? greenapple Calibre 3 03-29-2010 05:08 AM
Donations for Custom Recipes ddavtian Calibre 5 01-23-2010 04:54 PM
Help understanding custom recipes andersent Calibre 0 12-17-2009 02:37 PM


All times are GMT -4. The time now is 02:55 PM.


MobileRead.com is a privately owned, operated and funded community.