Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Closed Thread
 
Thread Tools Search this Thread
Old 09-05-2010, 12:14 PM   #2641
TonytheBookworm
Addict
TonytheBookworm is on a distinguished road
 
TonytheBookworm's Avatar
 
Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
Quote:
Originally Posted by somedayson View Post
Getting even closer.

I can read all the articles now, but there's stuff before and after them that I'm picking up off the web site. I can't figure out how to

1. Get it to the print only page

2. Get the stuff at the beginning (really disruptive for reading) and the end (not as bad but would love to remove it)


Thanks for any assistance anyone can provide. I certainly wouldn't mind a little .rar pack with the answer in it either!

Grateful either way,
Matt
You stated you are getting the print only page. I don't think you actually were getting the printer friendly version for some reason. Anyway. What you need to do is something like this. I haven't fully tested it but it should work.

Also please in the future wrap your code in spoiler and code tags. it makes it easier for all of us here

Spoiler:

Code:
from calibre.web.feeds.news import BasicNewsRecipe

class AdvancedUserRecipe1282101454(BasicNewsRecipe):
    title = 'FW'
    language = 'en'
    __author__ = 'TonytheBookworm'
    description = 'FW'
    publisher = 'Tony'
    category = 'whateveryouwant'
    oldest_article = 1
    max_articles_per_feed = 100
    no_stylesheets = True
    
    
      
      
    remove_tags = [dict(name='div', attrs={'id':['sidebar1']})]       
    feeds = [(u'Opinion', u'http://journalgazette.net/apps/pbcs.dll/section?Category=EDIT&template=blogrss&mime=xml'), 
             (u'Local News',u'http://journalgazette.net/apps/pbcs.dll/section?Category=LOCAL&template=blogrss&mime=xml') ,
             (u'Sports',u'http://journalgazette.net/apps/pbcs.dll/section?Category=SPORTS&template=blogrss&mime=xml' ),
             (u'Features',u'http://journalgazette.net/apps/pbcs.dll/section?Category=FEAT&template=blogrss&mime=xml'),
             (u'Business',u'http://journalgazette.net/apps/pbcs.dll/section?Category=BIZ&template=blogrss&mime=xml'),
             (u'Ice Chips',u'http://journalgazette.net/apps/pbcs.dll/section?Category=BLOGS11&template=blogrss&mime=xml '),
             (u'Entertainment',u'http://journalgazette.net/apps/pbcs.dll/section?Category=ENT&template=blogrss&mime=xml'),
             (u'Food',u'http://journalgazette.net/apps/pbcs.dll/section?Category=FOOD&template=blogrss&mime=xml')
            ]




    def print_version(self, url):
        split1 = url.split("/")
        print 'THE SPLIT IS: ', split1
        url1 = split1[0]
        url2 = split1[1]
        url3 = split1[2]
        url4 = split1[3]
        url5 = split1[4]
        url6 = split1[5]
        url7 = split1[6]
        url8 = split1[7]
      
  #need to convert to print_version
  #originalversion is : http://www.journalgazette.net/article/20100905/EDIT10/309059959/1021/EDIT
  #printversion should be: http://www.journalgazette.net/apps/pbcs.dll/article?AID=/20100905/EDIT10/309059959/-1/EDIT01&template=printart      
  #results of the split
  #THE SPLIT IS:  [u'http:', u'', u'www.journalgazette.net', u'article', u'20100905', u'EDIT10', u'309059959', u'1021', u'EDIT']
        
        
        
        print_url = 'http://' + url3 + '/apps/pbcs.dll/article?AID=/' + url5 + '/' + url6 + '/' + url7 + '/-1/EDIT01&template=printart'
        print 'THIS URL WILL PRINT: ', print_url # this is a test string to see what the url is it will return
        return print_url
TonytheBookworm is offline  
Old 09-05-2010, 01:05 PM   #2642
cynvision
Member
cynvision began at the beginning.
 
Posts: 14
Karma: 10
Join Date: Sep 2010
Device: nook
Well, it looks like I couldn't find the RSS links for trying over at that site. That's why my approach was to scrape the page itself. Where are they? The site avoids using the orange RSS buttons.
cynvision is offline  
Old 09-05-2010, 01:38 PM   #2643
poloman
Enthusiast
poloman began at the beginning.
 
Posts: 25
Karma: 10
Join Date: Nov 2008
Device: PRS505, Kindle 3G
@TonytheBookworm - thanks for the post - I've deliberately not looked at the spoiler you posted, so will try to come up with a solution myself - I too am a C# developer and this looks like a nice challenge - thanks again!
poloman is offline  
Old 09-05-2010, 01:44 PM   #2644
TonytheBookworm
Addict
TonytheBookworm is on a distinguished road
 
TonytheBookworm's Avatar
 
Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
Quote:
Originally Posted by poloman View Post
@TonytheBookworm - thanks for the post - I've deliberately not looked at the spoiler you posted, so will try to come up with a solution myself - I too am a C# developer and this looks like a nice challenge - thanks again!
I like the fact that once I get the hang of it, it makes reading more things possible which is rewarding to me. I can only imagine how Kovid felt when he completed Calibre and was able to use it to manage his books and make news feeds for himself.
TonytheBookworm is offline  
Old 09-05-2010, 01:54 PM   #2645
TonytheBookworm
Addict
TonytheBookworm is on a distinguished road
 
TonytheBookworm's Avatar
 
Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
Quote:
Originally Posted by cynvision View Post
Well, it looks like I couldn't find the RSS links for trying over at that site. That's why my approach was to scrape the page itself. Where are they? The site avoids using the orange RSS buttons.
I'm still not sure where he found the rss feeds but i noticed them in the post so I just used them
TonytheBookworm is offline  
Old 09-05-2010, 03:09 PM   #2646
TonytheBookworm
Addict
TonytheBookworm is on a distinguished road
 
TonytheBookworm's Avatar
 
Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
I know ( oldest_article ) lets you determine how far you wish to go back in the feed. However, If not specified I assume it defaults to 1.. Is there a way to turn it off completely to where it doesn't matter what the date is? Because I have a feed that post the top 25 yet it is not updated all the time and some of the content can be a year old.
TonytheBookworm is offline  
Old 09-05-2010, 03:12 PM   #2647
erikah
Junior Member
erikah began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Sep 2010
Device: Kobo eReader
Hi,
I would greatly appreciate it if anyone can create a recipe for the Walrus: http://www.walrusmagazine.com/
Thanks!
erikah is offline  
Old 09-05-2010, 07:08 PM   #2648
TonytheBookworm
Addict
TonytheBookworm is on a distinguished road
 
TonytheBookworm's Avatar
 
Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
Starson17,
I know how you enjoy the food recipe recipes so here is one you might enjoy. You might wanna modify the formatting a little to get rid of the two || (i can't figure out how to do it even with a findall. And also the little thumbnail gets put next to the start of the words where a <br> would be better after the image (another thing i'm not sure how to do)..

here is what i have though enjoy:
Spoiler:

BUCKMASTERS RECIPES
Code:
from calibre.web.feeds.news import BasicNewsRecipe

class AdvancedUserRecipe1282101454(BasicNewsRecipe):
    title = 'BuckMasters In The Kitchen'
    language = 'en'
    __author__ = 'TonytheBookworm'
    description = 'Learn how to cook all those outdoor varments'
    publisher = 'BuckMasters.com'
    category = 'food,cooking,recipes'
    oldest_article = 365
    max_articles_per_feed = 100
    conversion_options = {'linearize_tables' : True}
    #no_stylesheets = True
    #extra_css = '.headline {font-size: x-large;} \n .fact { padding-top: 10pt }'
    masthead_url = 'http://www.buckmasters.com/Portals/_default/Skins/BM_10/images/header_bg.jpg'
    keep_only_tags    = [
                         dict(name='table', attrs={'class':['containermaster_black']})
                        ]
    remove_tags = [dict(attrs={'class':['MenuTopSelected','MenuTop']})]
    remove_tags_after = [dict(name='div', attrs={'align':['left']})]
    feeds          = [
                      ('Recipes', 'http://www.buckmasters.com/DesktopModules/DnnForge%20-%20NewsArticles/RSS.aspx?TabID=292&ModuleID=658&MaxCount=25'),
                      
                    ]
TonytheBookworm is offline  
Old 09-05-2010, 09:17 PM   #2649
TonytheBookworm
Addict
TonytheBookworm is on a distinguished road
 
TonytheBookworm's Avatar
 
Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
Quote:
Originally Posted by erikah View Post
Hi,
I would greatly appreciate it if anyone can create a recipe for the Walrus: http://www.walrusmagazine.com/
Thanks!
I didn't do the blogs or the podcast only the magazine feed.

Here you go:
Attached Files
File Type: rar walrusmag.rar (1.4 KB, 236 views)
TonytheBookworm is offline  
Old 09-05-2010, 10:12 PM   #2650
erikah
Junior Member
erikah began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Sep 2010
Device: Kobo eReader
Quote:
Originally Posted by TonytheBookworm View Post
I didn't do the blogs or the podcast only the magazine feed.

Here you go:
Thanks!
And thanks Starson17 for your original post as well.
erikah is offline  
Old 09-05-2010, 11:01 PM   #2651
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by TonytheBookworm View Post
I know how you enjoy the food recipe recipes so here is one you might enjoy.
The food recipes are for the wife - I'll pass it along. Thanks!

Quote:
You might wanna modify the formatting a little to get rid of the two || (i can't figure out how to do it even with a findall. And also the little thumbnail gets put next to the start of the words where a <br> would be better after the image (another thing i'm not sure how to do)..
This version deals with both issues:
Spoiler:

BUCKMASTERS RECIPES
Code:
from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import BeautifulSoup, Tag
import re

class AdvancedUserRecipe1282101454(BasicNewsRecipe):
    title = 'BuckMasters In The Kitchen'
    language = 'en'
    __author__ = 'TonytheBookworm & Starson17'
    description = 'Learn how to cook all those outdoor varments'
    publisher = 'BuckMasters.com'
    category = 'food,cooking,recipes'
    oldest_article = 365
    max_articles_per_feed = 100
    conversion_options = {'linearize_tables' : True}
    masthead_url = 'http://www.buckmasters.com/Portals/_default/Skins/BM_10/images/header_bg.jpg'
    keep_only_tags    = [
                         dict(name='table', attrs={'class':['containermaster_black']})
                        ]
    remove_tags_after = [dict(name='div', attrs={'align':['left']})]
    feeds          = [
                      ('Recipes', 'http://www.buckmasters.com/DesktopModules/DnnForge%20-%20NewsArticles/RSS.aspx?TabID=292&ModuleID=658&MaxCount=25'),
                    ]

    def preprocess_html(self, soup):
        item = soup.find('a', attrs={'class':['MenuTopSelected']})
        if item:
            item.parent.extract()
        for img_tag in soup.findAll('img'):
            parent_tag = img_tag.parent
            if parent_tag.name == 'a':
                new_tag = Tag(soup,'p')
                new_tag.insert(0,img_tag)
                parent_tag.replaceWith(new_tag)
            elif parent_tag.name == 'p':
                if not self.tag_to_string(parent_tag) == '':
                    new_div = Tag(soup,'div')
                    new_tag = Tag(soup,'p')
                    new_tag.insert(0,img_tag)
                    parent_tag.replaceWith(new_div)
                    new_div.insert(0,new_tag)
                    new_div.insert(1,parent_tag)
        return soup
Starson17 is offline  
Old 09-05-2010, 11:33 PM   #2652
TonytheBookworm
Addict
TonytheBookworm is on a distinguished road
 
TonytheBookworm's Avatar
 
Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
Quote:
Originally Posted by Starson17 View Post
The food recipes are for the wife - I'll pass it along. Thanks!



This version deals with both issues:

[/spoiler]
Thanks for the modifications. Hey by the way the guy that does the stuff for Buckmasters Kitchen is the guy that made Shotgun Red. You remember Ralph Emery on Nashville Now and also that show I forget the name of it but it had Ernest in it giving Hunting and Fishing tips with the puppet ShotGun Red? Anyway thats the guy Steve Hall. It is funny to watch his wife while he cooks. She can't act and looks like a walking zombie
TonytheBookworm is offline  
Old 09-06-2010, 01:18 AM   #2653
bhandarisaurabh
Enthusiast
bhandarisaurabh began at the beginning.
 
Posts: 49
Karma: 10
Join Date: Aug 2009
Device: none
I would greatly appreciate anyone if anyone can create recipe for down to earth magazine without using feeds
http://downtoearth.org.in/archives/
bhandarisaurabh is offline  
Old 09-06-2010, 02:49 AM   #2654
cynvision
Member
cynvision began at the beginning.
 
Posts: 14
Karma: 10
Join Date: Sep 2010
Device: nook
I tried that Journal Gazette based on your scripting. The attached is looking good in my Nook, if anyone is interested. The other request for the news-sentinel.com is still being a pain. I set up with the RSS feed but do not get articles. Is this because it's calling a .dll in the article URL? I'm sort of ready to give up there.

edit: oh wait. It may be that pesky class vs. id that these news sites get confused. I'm having it look for an id...

edit2: nope. error is mismatching xml tags???
Attached Files
File Type: zip journalgazette.zip (1.8 KB, 254 views)

Last edited by cynvision; 09-06-2010 at 03:36 AM. Reason: still not working
cynvision is offline  
Old 09-06-2010, 04:06 AM   #2655
poloman
Enthusiast
poloman began at the beginning.
 
Posts: 25
Karma: 10
Join Date: Nov 2008
Device: PRS505, Kindle 3G
TonytheBookworm - sorry for sounding stupid, but in the code for TheDailyMash where you have print statements - where does it print to? I can't see an output anywhere when Calibre is running - is it piped to a file, or does it flash by in the current job status screen?


ps - i used this to get rid of the links at the end of the articles - can probably bin the 'object' and other tags, but it works and I'm (slowly) learning!

remove_tags = [
dict(name=['object','link','script','span','iframe','hr'])
,dict(name='a', attrs={'alt':['Digg!','StumbleUpon!','Reddit!','Facebook!']})
,dict(name='a', attrs={'title':['Digg!','StumbleUpon!','Reddit!','Facebook!']})
]

Last edited by poloman; 09-06-2010 at 06:00 AM.
poloman is offline  
Closed Thread


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Custom column read ? pchrist7 Calibre 2 10-04-2010 02:52 AM
Archive for custom screensavers sleeplessdave Amazon Kindle 1 07-07-2010 12:33 PM
How to back up preferences and custom recipes? greenapple Calibre 3 03-29-2010 05:08 AM
Donations for Custom Recipes ddavtian Calibre 5 01-23-2010 04:54 PM
Help understanding custom recipes andersent Calibre 0 12-17-2009 02:37 PM


All times are GMT -4. The time now is 07:26 PM.


MobileRead.com is a privately owned, operated and funded community.