Custom recipes (archive, read-only) - Page 177

TonytheBookworm · 09-05-2010, 01:14 PM

Quote:

Originally Posted by somedayson

Getting even closer.

I can read all the articles now, but there's stuff before and after them that I'm picking up off the web site. I can't figure out how to

1. Get it to the print only page

2. Get the stuff at the beginning (really disruptive for reading) and the end (not as bad but would love to remove it)

Thanks for any assistance anyone can provide. I certainly wouldn't mind a little .rar pack with the answer in it either!

Grateful either way,
Matt

You stated you are getting the print only page. I don't think you actually were getting the printer friendly version for some reason. Anyway. What you need to do is something like this. I haven't fully tested it but it should work.

Also please in the future wrap your code in spoiler and code tags. it makes it easier for all of us here

Spoiler:

Code:

from calibre.web.feeds.news import BasicNewsRecipe

class AdvancedUserRecipe1282101454(BasicNewsRecipe):
    title = 'FW'
    language = 'en'
    __author__ = 'TonytheBookworm'
    description = 'FW'
    publisher = 'Tony'
    category = 'whateveryouwant'
    oldest_article = 1
    max_articles_per_feed = 100
    no_stylesheets = True
    
    
      
      
    remove_tags = [dict(name='div', attrs={'id':['sidebar1']})]       
    feeds = [(u'Opinion', u'http://journalgazette.net/apps/pbcs.dll/section?Category=EDIT&template=blogrss&mime=xml'), 
             (u'Local News',u'http://journalgazette.net/apps/pbcs.dll/section?Category=LOCAL&template=blogrss&mime=xml') ,
             (u'Sports',u'http://journalgazette.net/apps/pbcs.dll/section?Category=SPORTS&template=blogrss&mime=xml' ),
             (u'Features',u'http://journalgazette.net/apps/pbcs.dll/section?Category=FEAT&template=blogrss&mime=xml'),
             (u'Business',u'http://journalgazette.net/apps/pbcs.dll/section?Category=BIZ&template=blogrss&mime=xml'),
             (u'Ice Chips',u'http://journalgazette.net/apps/pbcs.dll/section?Category=BLOGS11&template=blogrss&mime=xml '),
             (u'Entertainment',u'http://journalgazette.net/apps/pbcs.dll/section?Category=ENT&template=blogrss&mime=xml'),
             (u'Food',u'http://journalgazette.net/apps/pbcs.dll/section?Category=FOOD&template=blogrss&mime=xml')
            ]




    def print_version(self, url):
        split1 = url.split("/")
        print 'THE SPLIT IS: ', split1
        url1 = split1[0]
        url2 = split1[1]
        url3 = split1[2]
        url4 = split1[3]
        url5 = split1[4]
        url6 = split1[5]
        url7 = split1[6]
        url8 = split1[7]
      
  #need to convert to print_version
  #originalversion is : http://www.journalgazette.net/article/20100905/EDIT10/309059959/1021/EDIT
  #printversion should be: http://www.journalgazette.net/apps/pbcs.dll/article?AID=/20100905/EDIT10/309059959/-1/EDIT01&template=printart      
  #results of the split
  #THE SPLIT IS:  [u'http:', u'', u'www.journalgazette.net', u'article', u'20100905', u'EDIT10', u'309059959', u'1021', u'EDIT']
        
        
        
        print_url = 'http://' + url3 + '/apps/pbcs.dll/article?AID=/' + url5 + '/' + url6 + '/' + url7 + '/-1/EDIT01&template=printart'
        print 'THIS URL WILL PRINT: ', print_url # this is a test string to see what the url is it will return
        return print_url

cynvision · 09-05-2010, 02:05 PM

Well, it looks like I couldn't find the RSS links for trying over at that site. That's why my approach was to scrape the page itself. Where are they? The site avoids using the orange RSS buttons.

poloman · 09-05-2010, 02:38 PM

@TonytheBookworm - thanks for the post - I've deliberately not looked at the spoiler you posted, so will try to come up with a solution myself - I too am a C# developer and this looks like a nice challenge - thanks again!

TonytheBookworm · 09-05-2010, 02:44 PM

Quote:

Originally Posted by poloman

@TonytheBookworm - thanks for the post - I've deliberately not looked at the spoiler you posted, so will try to come up with a solution myself - I too am a C# developer and this looks like a nice challenge - thanks again!

I like the fact that once I get the hang of it, it makes reading more things possible which is rewarding to me. I can only imagine how Kovid felt when he completed Calibre and was able to use it to manage his books and make news feeds for himself.

TonytheBookworm · 09-05-2010, 02:54 PM

Quote:

Originally Posted by cynvision

Well, it looks like I couldn't find the RSS links for trying over at that site. That's why my approach was to scrape the page itself. Where are they? The site avoids using the orange RSS buttons.

I'm still not sure where he found the rss feeds but i noticed them in the post so I just used them

TonytheBookworm · 09-05-2010, 04:09 PM

I know ( oldest_article ) lets you determine how far you wish to go back in the feed. However, If not specified I assume it defaults to 1.. Is there a way to turn it off completely to where it doesn't matter what the date is? Because I have a feed that post the top 25 yet it is not updated all the time and some of the content can be a year old.

erikah · 09-05-2010, 04:12 PM

Hi,
I would greatly appreciate it if anyone can create a recipe for the Walrus: http://www.walrusmagazine.com/
Thanks!

TonytheBookworm · 09-05-2010, 08:08 PM

Starson17,
I know how you enjoy the food recipe recipes so here is one you might enjoy. You might wanna modify the formatting a little to get rid of the two || (i can't figure out how to do it even with a findall. And also the little thumbnail gets put next to the start of the words where a <br> would be better after the image (another thing i'm not sure how to do)..

here is what i have though enjoy:

Spoiler:

TonytheBookworm · 09-05-2010, 10:17 PM

Quote:

Originally Posted by erikah

Hi,
I would greatly appreciate it if anyone can create a recipe for the Walrus: http://www.walrusmagazine.com/
Thanks!

I didn't do the blogs or the podcast only the magazine feed.

Here you go:

erikah · 09-05-2010, 11:12 PM

Quote:

Originally Posted by TonytheBookworm

I didn't do the blogs or the podcast only the magazine feed.

Here you go:

Thanks!
And thanks Starson17 for your original post as well.

Starson17 · 09-06-2010, 12:01 AM

Quote:

Originally Posted by TonytheBookworm

I know how you enjoy the food recipe recipes so here is one you might enjoy.

The food recipes are for the wife - I'll pass it along. Thanks!

Quote:

You might wanna modify the formatting a little to get rid of the two || (i can't figure out how to do it even with a findall. And also the little thumbnail gets put next to the start of the words where a <br> would be better after the image (another thing i'm not sure how to do)..

This version deals with both issues:

Spoiler:

TonytheBookworm · 09-06-2010, 12:33 AM

Quote:

Originally Posted by Starson17

The food recipes are for the wife - I'll pass it along. Thanks!

This version deals with both issues:

[/spoiler]

Thanks for the modifications. Hey by the way the guy that does the stuff for Buckmasters Kitchen is the guy that made Shotgun Red. You remember Ralph Emery on Nashville Now and also that show I forget the name of it but it had Ernest in it giving Hunting and Fishing tips with the puppet ShotGun Red? Anyway thats the guy Steve Hall. It is funny to watch his wife while he cooks. She can't act and looks like a walking zombie

bhandarisaurabh · 09-06-2010, 02:18 AM

I would greatly appreciate anyone if anyone can create recipe for down to earth magazine without using feeds
http://downtoearth.org.in/archives/

cynvision · 09-06-2010, 03:49 AM

I tried that Journal Gazette based on your scripting. The attached is looking good in my Nook, if anyone is interested. The other request for the news-sentinel.com is still being a pain. I set up with the RSS feed but do not get articles. Is this because it's calling a .dll in the article URL? I'm sort of ready to give up there.

edit: oh wait. It may be that pesky class vs. id that these news sites get confused. I'm having it look for an id...

edit2: nope. error is mismatching xml tags???

poloman · 09-06-2010, 05:06 AM

TonytheBookworm - sorry for sounding stupid, but in the code for TheDailyMash where you have print statements - where does it print to? I can't see an output anywhere when Calibre is running - is it piped to a file, or does it flash by in the current job status screen?

ps - i used this to get rid of the links at the end of the articles - can probably bin the 'object' and other tags, but it works and I'm (slowly) learning!

remove_tags = [
dict(name=['object','link','script','span','iframe','hr'])
,dict(name='a', attrs={'alt':['Digg!','StumbleUpon!','Reddit!','Facebook!']})
,dict(name='a', attrs={'title':['Digg!','StumbleUpon!','Reddit!','Facebook!']})
]

09-06-2010, 05:06 AM	#2655
poloman Enthusiast Posts: 25 Karma: 10 Join Date: Nov 2008 Device: PRS505, Kindle 3G	TonytheBookworm - sorry for sounding stupid, but in the code for TheDailyMash where you have print statements - where does it print to? I can't see an output anywhere when Calibre is running - is it piped to a file, or does it flash by in the current job status screen? ps - i used this to get rid of the links at the end of the articles - can probably bin the 'object' and other tags, but it works and I'm (slowly) learning! remove_tags = [ dict(name=['object','link','script','span','iframe','hr']) ,dict(name='a', attrs={'alt':['Digg!','StumbleUpon!','Reddit!','Facebook!']}) ,dict(name='a', attrs={'title':['Digg!','StumbleUpon!','Reddit!','Facebook!']}) ] Last edited by poloman; 09-06-2010 at 07:00 AM.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Custom column read ?	pchrist7	Calibre	2	10-04-2010 03:52 AM
Archive for custom screensavers	sleeplessdave	Amazon Kindle	1	07-07-2010 01:33 PM
How to back up preferences and custom recipes?	greenapple	Calibre	3	03-29-2010 06:08 AM
Donations for Custom Recipes	ddavtian	Calibre	5	01-23-2010 05:54 PM
Help understanding custom recipes	andersent	Calibre	0	12-17-2009 03:37 PM

09-05-2010, 02:05 PM	#2642
cynvision Member Posts: 14 Karma: 10 Join Date: Sep 2010 Device: nook	Well, it looks like I couldn't find the RSS links for trying over at that site. That's why my approach was to scrape the page itself. Where are they? The site avoids using the orange RSS buttons.

09-05-2010, 02:38 PM	#2643
poloman Enthusiast Posts: 25 Karma: 10 Join Date: Nov 2008 Device: PRS505, Kindle 3G	@TonytheBookworm - thanks for the post - I've deliberately not looked at the spoiler you posted, so will try to come up with a solution myself - I too am a C# developer and this looks like a nice challenge - thanks again!

09-05-2010, 04:09 PM	#2646
TonytheBookworm Addict Posts: 264 Karma: 62 Join Date: May 2010 Device: kindle 2, kindle 3, Kindle fire	I know ( oldest_article ) lets you determine how far you wish to go back in the feed. However, If not specified I assume it defaults to 1.. Is there a way to turn it off completely to where it doesn't matter what the date is? Because I have a feed that post the top 25 yet it is not updated all the time and some of the content can be a year old.

09-05-2010, 04:12 PM	#2647
erikah Junior Member Posts: 2 Karma: 10 Join Date: Sep 2010 Device: Kobo eReader	Hi, I would greatly appreciate it if anyone can create a recipe for the Walrus: http://www.walrusmagazine.com/ Thanks!

09-06-2010, 02:18 AM	#2653
bhandarisaurabh Enthusiast Posts: 49 Karma: 10 Join Date: Aug 2009 Device: none	I would greatly appreciate anyone if anyone can create recipe for down to earth magazine without using feeds http://downtoearth.org.in/archives/