Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 01-02-2012, 10:47 AM   #1
falcons75scp
Junior Member
falcons75scp began at the beginning.
 
falcons75scp's Avatar
 
Posts: 4
Karma: 10
Join Date: Jan 2012
Location: Tulsa, OK
Device: Nook (Simple Touch & Color)
Tulsa World "Built-In" Recipe

Greetings all from a newbie to this forum. I'll start out by saying in my first post here, that I DID search through the forum to find this specific problem - but really had no luck.

I've been using several "built-in" recipes (for most of the last year) to keep up with the news while I travel. For the most part, I've been very pleased with the results. Some especially good recipes that I've used include the St Louis Post Dispatch, The Washington Post, USA Today, Wired Magazine, BBC & CNN.

I've also been pleased to have a "built-in" recipe available for my local paper (Tulsa World) which allows me to follow local stories on my reader while I am away from home. However, despite the fact that the recipe produces three sections (News, Business, & Opinion) - these three sections appear to be identical, as near as I can tell. If at all possible, I'd like to see a recipe that produces all the normal sections of my paper including Local, Scene & Sports and the columnists regularly featured in our paper.

I've tried with only marginal success to create a custom recipe. I do see more articles listed in each section of my custom recipe as compared to the built-in recipe. However generally, my custom recipe only produces only a line or two from each story in the appropriate section menu - and almost nothing of the article appears when I go to the link displayed in the section menu. The other common problem I get is: links to articles that only display reader comments to stories without displaying the story to which they are commenting. As you can imagine, this can be pretty frustrating.

If it helps any, I am a print subscriber to the Tulsa World - and have a registered email address that allows full online access to the paper. This works great on a computer, but I'd prefer to read my news on my e-reader. It appears that some recipes have provisions for allowing full access (e.g. New York Times, Wall Street Journal) by entering log-in credentials, but I don't see that feature for the Tulsa World recipe - maybe I don't know where to look.

Can anyone help either improve the built-in recipe for the Tulsa World in an upcoming Calibre update or come up with a good custom recipe?

Thanks in advance,

Steve
falcons75scp is offline   Reply With Quote
Old 01-02-2012, 11:37 AM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,345
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
While I lack the time for a more detailed response. Some quick tips:

1) When creating a custom recipe, for someplace that already has a builtin recipe, its a good idea to customize the buitlin recipe rather than starting from scratch. You can do this by clicking the "Customize builtin recipe" button in the add your own news sources dialog.

2) A quick look at the builtin tulsa world recipe shows me that it uses RSS feeds. You should be able to add more secitions by adding the RSS feeds for those sections to the builtin recipe.
kovidgoyal is online now   Reply With Quote
Advert
Old 01-03-2012, 12:39 AM   #3
falcons75scp
Junior Member
falcons75scp began at the beginning.
 
falcons75scp's Avatar
 
Posts: 4
Karma: 10
Join Date: Jan 2012
Location: Tulsa, OK
Device: Nook (Simple Touch & Color)
Wow, the Boss himself! Thanks for the reply. In my two (or three) attempts to make a recipe, I can't remember whether I tried to edit the "built-in" script. I know for sure that I used the paper's web site RSS feed links. I guess that I need to keep trying. It's too bad work interferes with fun stuff. On the other hand, if I wasn't working, I wouldn't be traveling so much and I wouldn't need to try to make this work better.
falcons75scp is offline   Reply With Quote
Old 01-09-2012, 12:05 AM   #4
bburky
Junior Member
bburky began at the beginning.
 
Posts: 2
Karma: 12
Join Date: Jan 2012
Device: Kindle 4
The URL structure of the RSS feeds changed slightly for Tulsa World. Instead of http://www.tulsaworld.com/site/rss.aspx?group=1 it is now http://www.tulsaworld.com/site/rss/rss.aspx?group=1

The http://www.tulsaworld.com/site/rss/ page lists all the RSS feeds for Tulsa World.

Here's a new version of the recipe with the new rss feeds included. Also I've included the subcategories of feeds and commented them out. You may select which feeds you want.

Spoiler:
Code:
__license__   = 'GPL v3'
__copyright__ = '2010, Darko Miletic <darko.miletic at gmail.com>'
'''
tulsaworld.com
'''

from calibre.web.feeds.news import BasicNewsRecipe

class TulsaWorld(BasicNewsRecipe):
    title                 = 'Tulsa World'
    __author__            = 'Darko Miletic'
    description           = 'Find breaking news, local news, Oklahoma weather, sports, business, entertainment, lifestyle, opinion, government, movies, books, jobs, education, blogs, video & multimedia.'
    publisher             = 'World Publishing Co.'
    category              = 'Tulsa World, tulsa world, daily newspaper, breaking news, stories, articles, news, local, weather, coverage, editorial, government, education, community, sports, business, entertainment, lifestyle, opinion, multimedia, media, blogs, consumer, OU, OSU, TU, ORU, football, basketball, school, schools, sudoku, movie reviews, stocks, classified ads, classifieds, books, job, jobs, careers, real estate, home, homes, Oklahoma, northeastern, reviews, auto, autos, archives, forecasts, Sooners, Cowboys, Hurricane, Golden Eagles, NFL, NBA, MLB, pro football, scores, college basketball, college football, college baseball, sports columns, fashion and style, associated press, regional news coverage, health, obituaries, politics, political news, Jenks, Union, Owasso, Tulsa, Booker T. Washington, Trojans, Rams, Hornets, video, photography, photos, images, games, search, the picker, predictions, satellite, family, food, teens, polls, births, celebrations, death notices, divorces, marriages, obituaries, audio, podcasts.'
    oldest_article        = 2
    max_articles_per_feed = 200
    no_stylesheets        = True
    encoding              = 'utf8'
    use_embedded_content  = False
    language              = 'en'
    country               = 'US'
    remove_empty_feeds    = True
    masthead_url          = 'http://www.tulsaworld.com/images/TW_logo-blue-footer.jpg'
    extra_css             = ' body{font-family: Arial,Verdana,sans-serif } img{margin-bottom: 0.4em} .articleHeadline{font-size: xx-large; font-weight: bold} .articleKicker{font-size: x-large; font-weight: bold} .articleByline,.articleDate{font-size: small} .leadp{font-size: 1.1em} '

    conversion_options = {
                          'comment'          : description
                        , 'tags'             : category
                        , 'publisher'        : publisher
                        , 'language'         : language
                        , 'linearize_tables' : True
                        }
    keep_only_tags = [dict(name='div',attrs={'id':['ctl00_body1_ArticleControl_divArticleText','ctl00_BodyContent_ArticleControl_divArticleText']})]

    feeds = [
        # The first feed of each category is an aggregation of the subcategories that follow
        # For example, "News" contains "Local", "State", "Legal", etc.

        ### News
        (u'News', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=1'),
        #(u'Local', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=11'),
        #(u'State', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=12'),
        #(u'Legal', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=14'),
        #(u'Consumer Awareness', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=15'),
        #(u'Government', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=16'),
        #(u'Health &amp; Fitness', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=17'),
        #(u'Religion', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=18'),
        #(u'Education', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=19'),
        #(u'Jay Cronley', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=206'),
        #(u'SemGroup', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=351'),
        #(u'Inhofe', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=447'),
        #(u'Coburn', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=448'),
        #(u'Sullivan', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=449'),

        ### Sports
        (u'Sports', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=2'),
        #(u'OU', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=92'),
        #(u'OSU', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=93'),
        #(u'TU', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=94'),
        #(u'ORU', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=95'),
        #(u'Dave Sittler', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=202'),
        #(u'John Klein', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=203'),
        #(u'The Picker', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=204'),
        #(u'High School Football', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=227'),
        #(u'Boys Basketball', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=230'),
        #(u'College Football', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=231'),
        #(u'College Basketball', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=234'),

        ### Scene
        (u'Scene', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=4'),
        #(u'Food', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=39'),
        #(u'Home &amp; Garden', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=41'),
        #(u'People', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=42'),
        #(u'Style', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=43'),
        #(u'Celebrations', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=59'),
        #(u'Scott Cherry', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=207'),
        #(u'Jason Ashley Wright', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=208'),
        #(u'Column - Walker', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=209'),
        #(u'Garden', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=517'),

        ### Business
        (u'Business', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=5'),
        #(u'Tech', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=52'),

        ### Transitions
        #(u'Transitions', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=6'),
        #(u'Births', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=55'),
        #(u'Obits: Death Notices', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=56'),
        #(u'Divorces', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=57'),
        #(u'Obits: Obituaries (News Obits)', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=58'),
        #(u'Marriages', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=60'),

        ### Opinion
        (u'Opinion', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=7'),
        #(u'Letters to the Editor', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=62'),
        #(u'Political Cartoon', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=63'),
        #(u'Janet Pearson', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=211'),
        #(u'Column - Jones', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=213'),
        #(u'Julie Delcour', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=214'),
        #(u'David Averill', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=215'),

        ### Community
        (u'Community', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=9'),


        ### Blog Feeds
        # No combined category feeds
        # These are untested. They will likely not work at all
        # Also some of these links appear to not work correctly

        # NOT WORKING:

        ### Sports Blogs
        #(u'Mike Strain', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=8&blog=1'),
        #(u'John Klein', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=15&blog=1'),
        #(u'Dave Sittler', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=13&blog=1'),
        #(u'Jimmie Tramel', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=9&blog=1'),
        #(u'Bill Haisten', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=16&blog=1'),
        #(u'The Picker', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=14&blog=1'),
        #(u'OSU Cowboys', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=11&blog=1'),
        #(u'OU Sooners', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=12&blog=1'),
        #(u'TU Golden Hurricane', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=10&blog=1'),
        #(u'ORU Golden Eagles', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=17&blog=1'),
        #(u'High School', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=26&blog=1'),

        ### Lifestyle Blogs
        #(u'Scott Cherry', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=21&blog=1'),
        #(u'Natalie Mikles', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=22&blog=1'),
        #(u'Michael Smith', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=5&blog=1'),
        #(u'Jason Ashley Wright', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=3&blog=1'),
        #(u'Jennifer Chancellor', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=29&blog=1'),

        ### Opinion Blogs
        #(u'Mike Jones', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=31&blog=1'),
        #(u'Wayne Greene', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=30&blog=1')
    ]

    def get_article_url(self, article):
        return article.get('link',  None).rpartition('&rss')[0]

    def preprocess_html(self, soup):
        for item in soup.findAll(style=True):
            del item['style']
        return self.adeify_images(soup)


Also, there are various blogs by columnists with RSS feeds listed. I included them, but I believe they will not work. Those pages are significantly different from the rest of the site and I'm not sure the article text will work. You're welcome to check though.

I only slightly tested this. It should work fine though.
bburky is offline   Reply With Quote
Old 01-09-2012, 05:46 PM   #5
falcons75scp
Junior Member
falcons75scp began at the beginning.
 
falcons75scp's Avatar
 
Posts: 4
Karma: 10
Join Date: Jan 2012
Location: Tulsa, OK
Device: Nook (Simple Touch & Color)
bburky,

So far, I've only modified the built-in recipe slightly using the first feed from each section that you included in your recipe. Already, I can see that this is a BIG improvement over the built-in recipe and FAR more effective than my previous attempts to build a recipe on my own.

Although it took me two tries to get a modification working, it did work. Thank you so much for pointing me in the right direction.

Steve
falcons75scp is offline   Reply With Quote
Advert
Old 01-09-2012, 06:16 PM   #6
bburky
Junior Member
bburky began at the beginning.
 
Posts: 2
Karma: 12
Join Date: Jan 2012
Device: Kindle 4
No problem.

The only modification I made to the built-in recipe was updating the list of feeds. So if you're using that, that's actually all the changes I made.

There are still a few problems though. A few articles are seemingly empty and sometimes some HTML source code fragments pulled in as text in photo captions. I have not done anything to fix that.
bburky is offline   Reply With Quote
Old 01-11-2012, 09:51 PM   #7
falcons75scp
Junior Member
falcons75scp began at the beginning.
 
falcons75scp's Avatar
 
Posts: 4
Karma: 10
Join Date: Jan 2012
Location: Tulsa, OK
Device: Nook (Simple Touch & Color)
Yes, I see several links with the first line or two of the corresponding story in the section menus, but also get the blank page when going to the link. Even still, it is a fairly dramatic improvement over the "built-in" recipe. Thanks again.
falcons75scp is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Recipe for "Galicia Confidencial" and "De L a V" roebek Recipes 1 07-19-2011 09:17 AM
Steve Jobs offers world "freedom from porn" dmaul1114 Apple Devices 22 05-17-2010 11:42 PM
First User Program: "Hello Mobileread World" Adam B. iRex 3 10-27-2008 06:46 PM


All times are GMT -4. The time now is 09:00 PM.


MobileRead.com is a privately owned, operated and funded community.