Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Closed Thread
 
Thread Tools Search this Thread
Old 09-07-2010, 12:05 PM   #2671
TonytheBookworm
Addict
TonytheBookworm is on a distinguished road
 
TonytheBookworm's Avatar
 
Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
Quote:
Originally Posted by Starson17 View Post
Everyone is so friendly here, I know it's a temptation to stray off topic, but you probably should go to another thread with this. The recipe thread is tough to wade through as it is with all the lengthy recipes. <GRIN>
Yeah sorry about that. If a mod can move the post that would be great. again sorry.
TonytheBookworm is offline  
Old 09-07-2010, 12:31 PM   #2672
poloman
Enthusiast
poloman began at the beginning.
 
Posts: 25
Karma: 10
Join Date: Nov 2008
Device: PRS505, Kindle 3G
fair point well made - will pm you Tony. doh - thought I'd be able to delete this post - sorry for this - if someone could delete it please!

edit: bringing it back on topic - I (lazily) added a simple feed for slashdot (http://rss.slashdot.org/Slashdot/slashdot) as I didn't want all the comments - the prospect of getting banned using the built in recipe deterred me from using it, and it takes a long time to run.

However, the simple feed results, when it appears on the kindle, shows the article summary fine in the sections view (ie the article title and the beginnings of the article), but when i click to read it, the article and header are not there - just the comments.

Is there a simple solution, or a recipe that solves this? I tried making one that keeps only the artle section, but didnt have much luck: <annoyingly, i seem to have deleted it - but have this one which shows the general idea>

Spoiler:


from calibre.web.feeds.news import BasicNewsRecipe

class SlashDotRecipe(BasicNewsRecipe):
title = 'SlashDot' #v1
language = 'en'
__author__ = 'db'
description = 'SlashDot Articles'
publisher = 'Web'
category = ''
oldest_article = 7
conversion_options = {'linearize_tables' : True}
max_articles_per_feed = 100
no_stylesheets = True

#masthead_url = 'http://www.gtdtimes.com/images/GTDTimes_header.png'

feeds = [
('SlashDot', 'http://rss.slashdot.org/Slashdot/slashdot'),
]

no_stylesheets = True
keep_only_tags = [
dict(name='div',attrs={'class':'body'})
]



remove_tags = [
dict(name='div', attrs={'class':'article-foot'})
]

def get_article_url(self, article):
return article.get('feedburner_origlink', None)


Last edited by poloman; 09-09-2010 at 03:41 AM.
poloman is offline  
Old 09-09-2010, 11:44 AM   #2673
TonytheBookworm
Addict
TonytheBookworm is on a distinguished road
 
TonytheBookworm's Avatar
 
Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
Quote:
Originally Posted by poloman View Post
edit: bringing it back on topic - I (lazily) added a simple feed for slashdot (http://rss.slashdot.org/Slashdot/slashdot) as I didn't want all the comments - the prospect of getting banned using the built in recipe deterred me from using it, and it takes a long time to run.

However, the simple feed results, when it appears on the kindle, shows the article summary fine in the sections view (ie the article title and the beginnings of the article), but when i click to read it, the article and header are not there - just the comments.

Is there a simple solution, or a recipe that solves this? I tried making one that keeps only the artle section, but didnt have much luck: <annoyingly, i seem to have deleted it - but have this one which shows the general idea>

Spoiler:

Code:
    keep_only_tags = [
                 dict(name='div',attrs={'class':'body'})
    ]
change it to look like this
[spoiler]
Code:
keep_only_tags = [
                    dict(name='a', attrs={'class':'datitle'}),
                    dict(name='span', attrs={'class':'date'}),
                    dict(name='div',attrs={'class':'body'})
                   ]
you will get the title then date next to each other. In that case you would probably wanna do a preprocess_html and insert a <br> somehow another. I haven't mastered the inserting part yet
TonytheBookworm is offline  
Old 09-09-2010, 12:46 PM   #2674
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by TonytheBookworm View Post
In that case you would probably wanna do a preprocess_html and insert a <br> somehow another. I haven't mastered the inserting part yet
Have you ever seen that puzzle with 3 posts and disks of increasing size? Playing with new tags, inserting tags, moving tags, etc. is like that puzzle.

You might want to review my mods to your Buckmaster recipe here:
https://www.mobileread.com/forums/sho...postcount=2651
It adds <p></p> surrounding images (instead of inserting a <br>).

Beautiful Soup lets you create tags easily. It lets you replace one tag with another (replaceWith). It lets you insert a tag into another tag, but it requires you to insert by location number. The problem is that you either have to calculate the correct location number, or you have to replace a tag with a newly created tag. The first is a pain. The second is tricky if you want to keep the contents of the tag being replaced.

One approach can be seen here:
Code:
        for img_tag in soup.findAll('img'): 
            parent_tag = img_tag.parent 
            if parent_tag.name == 'a': 
                new_tag = Tag(soup,'p') 
                new_tag.insert(0,img_tag) 
                #at this point img_tag has been extracted from soup 
                #and put into new_tag - parent_tag remains in soup
                parent_tag.replaceWith(new_tag)
Starson17 is offline  
Old 09-09-2010, 11:40 PM   #2675
TonytheBookworm
Addict
TonytheBookworm is on a distinguished road
 
TonytheBookworm's Avatar
 
Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
Quote:
Originally Posted by Starson17 View Post
Have you ever seen that puzzle with 3 posts and disks of increasing size? Playing with new tags, inserting tags, moving tags, etc. is like that puzzle.

You might want to review my mods to your Buckmaster recipe here:
https://www.mobileread.com/forums/sho...postcount=2651
It adds <p></p> surrounding images (instead of inserting a <br>).
Yeah, even though I understand what you did in this instance. The tag replacements for me are still a thing I'm a little confused about. Just going to take time to fully understand it. As for the puzzle never played that one but I know what your talking about. Anyway, take care man. I also wanna thank you for those linked you posted they are very informative.

Oh, one other thing. Is this the place we should post recipes (as in one's we consider working and complete?) Or do we submit them on bug tracker or where?
TonytheBookworm is offline  
Old 09-10-2010, 07:41 AM   #2676
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by TonytheBookworm View Post
Oh, one other thing. Is this the place we should post recipes (as in one's we consider working and complete?) Or do we submit them on bug tracker or where?
Kovid seems to watch both places, but I tend to think of the bug tracker as the best place to indicate you think the recipe is ready for inclusion. This is the best place to make it available for others.
Starson17 is offline  
Old 09-10-2010, 01:11 PM   #2677
somedayson
Member
somedayson began at the beginning.
 
Posts: 13
Karma: 10
Join Date: Sep 2010
Device: K3
Thanks for all the previous help...got an awesome customized news reader on my new kindle thanks to Calibre and your help.

It's been great to go into the recipes and add and remove sections of rss feeds from newspapers....awesome!

One last one I can't seem to figure out...it's a hockey team's feeds. Maybe not of much interest to many, but perhaps posting the recipe could help someone else like it did for me as I read through all 176 pages at the time (crazy, right?)

Here's the feed if someone would be willing to take a shot or help point me in the right direction. When I click on the two RSS links, they bring up some weird stuff I haven't seen before--sometimes, and then other times I get the articles.

Web page: http://blackhawks.nhl.com/club/feedinfo.htm

RSS #1: http://blackhawks.nhl.com/rss/top-stories.xml
Rss#2: http://blackhawks.nhl.com/rss/news.xml


I really appreciate everyone's help in learning this system!

Thanks,
Matt
somedayson is offline  
Old 09-10-2010, 01:32 PM   #2678
somedayson
Member
somedayson began at the beginning.
 
Posts: 13
Karma: 10
Join Date: Sep 2010
Device: K3
Some more info about the above request.

I'm just just not sure how to pull up the print portion. Here's the example:

http://blackhawks.nhl.com/club/news....rss-blackhawks
(regular page)

http://blackhawks.nhl.com/club/newsprint.htm?id=533848
(print version)

You can see that I need to insert "print" right after news in the first feed and drop everything after the "&" in the first feed

Here's the best rss to work from:

http://blackhawks.nhl.com/club/newsi...location=/news


Thanks again for any help anyone can provide,
Matt
somedayson is offline  
Old 09-10-2010, 02:17 PM   #2679
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by somedayson View Post
You can see that I need to insert "print" right after news in the first feed and drop everything after the "&" in the first feed
This part is easy:
Code:
    def print_version(self, url):
        main1, replace1, end1 = url.partition('news.htm?')
        url = main1 + 'newsprint.htm?' + end1
        main2, middle2, end2 = url.partition('&')
        return main2
The partition and rpartition functions are made for splitting up urls. The url.partition('news.htm?') takes out the quoted section, with the next line replacing it with 'newsprint.html?' and url.partition('&') just splits off the front part, which gets returned.

Last edited by Starson17; 09-10-2010 at 02:31 PM. Reason: Typo in code
Starson17 is offline  
Old 09-10-2010, 03:17 PM   #2680
somedayson
Member
somedayson began at the beginning.
 
Posts: 13
Karma: 10
Join Date: Sep 2010
Device: K3
Thanks Starson...I'm now getting about 10 "pages" on my kindle of the headlines and all the links beyond those. I've got Firefox and Firebug, and am trying lots of "keep only" and "remove only" tags but I can't quite find what the article content is labeled.

I really appreciate the spirit of teaching and learning that happens here.
somedayson is offline  
Old 09-10-2010, 03:31 PM   #2681
somedayson
Member
somedayson began at the beginning.
 
Posts: 13
Karma: 10
Join Date: Sep 2010
Device: K3
Here's my latest attempt...still can't exclude the junk above and below the articles. Tried all the pages of web pages a few pages early on this, but don't quite have it.

Code:
class AdvancedUserRecipe1284145178(BasicNewsRecipe):
    title          = u'Blackhawks Headlines'
    oldest_article = 7
    max_articles_per_feed = 100

    feeds          = [(u'Blackhawks Recent Headlines', u'http://blackhawks.nhl.com/rss/news.xml')]

def print_version(self, url):
        main1, replace1, end1 = url.partition('news.htm?')
        url = main1 + 'newsprint.htm?' + end1
        main2, middle2, end2 = url.partition('&')
        return main2

        keep_only_tags [dict(name='div', attrs={'class':'newsBody'})]
After about three hours on this total, I'd just love the answer if someone is willing to throw me a bone. I know I'm close...
somedayson is offline  
Old 09-10-2010, 04:08 PM   #2682
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by somedayson View Post
Here's my latest attempt...still can't exclude the junk above and below the articles. Tried all the pages of web pages a few pages early on this, but don't quite have it.

Spoiler:
Code:
class AdvancedUserRecipe1284145178(BasicNewsRecipe):
    title          = u'Blackhawks Headlines'
    oldest_article = 7
    max_articles_per_feed = 100

    feeds          = [(u'Blackhawks Recent Headlines', u'http://blackhawks.nhl.com/rss/news.xml')]

def print_version(self, url):
        main1, replace1, end1 = url.partition('news.htm?')
        url = main1 + 'newsprint.htm?' + end1
        main2, middle2, end2 = url.partition('&')
        return main2

        keep_only_tags [dict(name='div', attrs={'class':'newsBody'})]


After about three hours on this total, I'd just love the answer if someone is willing to throw me a bone. I know I'm close...

Your print_version isn't running. It needs to be indented to run. You don't need the keep_only_tags. Try this:

Spoiler:
Code:
class AdvancedUserRecipe1284145178(BasicNewsRecipe):
    title          = u'Blackhawks Headlines'
    __author__          = 'Starson17'
    oldest_article = 7
    max_articles_per_feed = 100
    remove_javascript = True
    remove_empty_feeds  = True

    feeds          = [(u'Blackhawks Recent Headlines', u'http://blackhawks.nhl.com/rss/news.xml')]

    def print_version(self, url):
        main1, replace1, end1 = url.partition('news.htm?')
        url = main1 + 'newsprint.htm?' + end1
        main2, middle2, end2 = url.partition('&')
        return main2

    extra_css = '''
                    .headline{font-family:Arial,Helvetica,sans-serif; font-weight:bold;font-size:large;}
                    h2{font-family:Arial,Helvetica,sans-serif; font-weight:normal;font-size:small;}
                    p{font-family:Arial,Helvetica,sans-serif;font-size:small;}
                    body{font-family:Helvetica,Arial,sans-serif;font-size:small;}
                    #newsBody{font-family:Helvetica,Arial,sans-serif;font-size:small;text-indent:2em;}
		'''


It should be close. (I threw in some basic formatting.)

Last edited by Starson17; 09-10-2010 at 04:12 PM.
Starson17 is offline  
Old 09-10-2010, 10:17 PM   #2683
bhandarisaurabh
Enthusiast
bhandarisaurabh began at the beginning.
 
Posts: 49
Karma: 10
Join Date: Aug 2009
Device: none
Quote:
Originally Posted by TonytheBookworm View Post
Print edition? As in subscribed? Or As in whats on the page as you see it? Or the rss link that is over on the right hand side?

If your calling the "print edition" what you see currently on the screen when you go to that link I don't see the point in doing it. Because each month/week that the issue changes you are going to have to change the feed reference from 148 to Nth Or am I'm missing your question completely ?
okay got you point but can you make recipe of print edition of magazine
http://downtoearth.org.in/archives/
It is a humble request
bhandarisaurabh is offline  
Old 09-11-2010, 01:04 AM   #2684
TonytheBookworm
Addict
TonytheBookworm is on a distinguished road
 
TonytheBookworm's Avatar
 
Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
Starson17,
If i wanted an if statement that checked if the parent was <div id='MainContent'>
how would I go about doing it?
would it be
Code:
mydaddy = item.parent
if mydaddy.name = 'MainContent'
  .......
I seen examples of how you and others do a parent match for <a> and <p> and so forth but not for an actual div id... tag

thanks

Last edited by TonytheBookworm; 09-11-2010 at 01:59 AM. Reason: Changed question
TonytheBookworm is offline  
Old 09-11-2010, 02:37 AM   #2685
TonytheBookworm
Addict
TonytheBookworm is on a distinguished road
 
TonytheBookworm's Avatar
 
Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
Quote:
Originally Posted by bhandarisaurabh View Post
okay got you point but can you make recipe of print edition of magazine
http://downtoearth.org.in/archives/
It is a humble request
Here you go I only done 2010. Each year appears to have different formatting but a years worth of stuff should be enough for now
Attached Files
File Type: rar down to earth.rar (1.2 KB, 258 views)

Last edited by TonytheBookworm; 09-11-2010 at 02:40 AM.
TonytheBookworm is offline  
Closed Thread


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Custom column read ? pchrist7 Calibre 2 10-04-2010 02:52 AM
Archive for custom screensavers sleeplessdave Amazon Kindle 1 07-07-2010 12:33 PM
How to back up preferences and custom recipes? greenapple Calibre 3 03-29-2010 05:08 AM
Donations for Custom Recipes ddavtian Calibre 5 01-23-2010 04:54 PM
Help understanding custom recipes andersent Calibre 0 12-17-2009 02:37 PM


All times are GMT -4. The time now is 09:47 AM.


MobileRead.com is a privately owned, operated and funded community.