Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Closed Thread
 
Thread Tools Search this Thread
Old 02-07-2009, 02:39 PM   #181
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,397
Karma: 27756918
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Quote:
Originally Posted by XanthanGum View Post
Hi,

I've had similar problems with the recipes I've tried to create for other publications. If I knew more about how to fine tune the recipes I'd have more luck and would like to share them with everyone here.

What about a workshop for creating recipes? Those of you who understand the intricacies of how they work could team lead a workshop for us beginners. We could start out with simple examples and then gradually build on that so that we could solve the problem mentioned above.

Kovid? Dominic? Would such a workshop be possible? Is there anyone with time that could lead such a "class"?

The more recipes generated the better. I think it adds value to Kovid's fantastic program.

I would love to see such a workshop.

Tschuess (German for Bye)...

Xanthan Gum
I'm guessing the documentation is insufficient for what you need?
kovidgoyal is online now  
Old 02-07-2009, 05:23 PM   #182
kilikini
Enthusiast
kilikini has a complete set of Star Wars action figures.kilikini has a complete set of Star Wars action figures.kilikini has a complete set of Star Wars action figures.kilikini has a complete set of Star Wars action figures.
 
kilikini's Avatar
 
Posts: 43
Karma: 376
Join Date: Jan 2009
Location: California, USA
Device: K3, KFire, iPad, iPhone
Just want to say thanks for the Honolulu Advertiser and Star Bulletin, they work great!

Much appreciated
kilikini is offline  
Advert
Old 02-07-2009, 06:28 PM   #183
tbaac
Member
tbaac began at the beginning.
 
Posts: 11
Karma: 10
Join Date: Feb 2009
Device: Sony PRS505
Calibre looks to be a fantastic program Kovidgoyal. Thank you.

kiklop74: Thank you for the recipe for New Statesman. Unfortunately I'm having difficulties with it. Other (built in recipes) seem to work, but the python script of yours I cannot get to run. When I click "Download" to start the download, nothing happens.

I tried pasting the contents of the .py file and I tried using "Load recipe from file". I see the code loaded into the edit box but it seems not to do anything.

Any idea what I might be doing wrong? Thank you.

Edit: Having read in the manual in the "Tips for developing new recipes" section, I tried running each of the recommended commands from the command line (with the newstatesman.py filename) and it worked perfectly. So I don't quite understand why it won't work within the Calibre GUI. Hmmm.

Last edited by tbaac; 02-07-2009 at 06:52 PM. Reason: Read something in the manual........
tbaac is offline  
Old 02-07-2009, 07:11 PM   #184
tbaac
Member
tbaac began at the beginning.
 
Posts: 11
Karma: 10
Join Date: Feb 2009
Device: Sony PRS505
Okay, I'm not sure what was going wrong. I tried it from the command line, found that sometimes it seemed that it helped if I closed Calibre and reopened it. It works really well now, thank you.

I changed some feeds and ended up with this:

Code:
#!/usr/bin/env  python

__license__   = 'GPL v3'
__copyright__ = '2009, Darko Miletic <darko.miletic at gmail.com>'
'''
newstatesman.com
'''

class NewStatesman(BasicNewsRecipe):
    title                 = 'New Statesman'
    __author__            = 'Darko Miletic'
    description           = "Britain's award-winning current affairs magazine"
    publisher             = 'New Statesman'
    category              = 'news, UK, World'
    oldest_article        = 7
    max_articles_per_feed = 100
    no_stylesheets        = True
    use_embedded_content  = False
    encoding              = 'cp1252'
    remove_javascript     = True
    cover_url             = 'http://media.starbulletin.com/designimages/spacer.gif'

    html2lrf_options = [
                          '--comment'       , description
                        , '--base-font-size', '10'
                        , '--category'      , category
                        , '--publisher'     , publisher
                        ]
    
    html2epub_options = 'publisher="' + publisher + '"\ncomments="' + description + '"\ntags="' + category + '"'
    
    keep_only_tags = [dict(name='div', attrs={'class':'content-main'})]

    remove_tags = [
                    dict(name=['object','link','form','ul'])
                   ,dict(name='ul', attrs={'class':'post-article'})
                   ,dict(name='div' , attrs={'class':['tag-nav-container','article-base']})
                   ,dict(name='div' , attrs={'id':['reader-comments']})                    
                  ]
                        
    feeds = [
              (u'Politics', u'http://www.newstatesman.com/feeds/topics/politics.rss'), (u'Arts & Culture', u'http://www.newstatesman.com/feeds/topics/arts-and-culture.rss'), (u'Books', u'http://www.newstatesman.com/feeds/topics/books.rss'), (u'Life & Society', u'http://www.newstatesman.com/feeds/topics/life-and-society.rss'), (u'World Affairs', u'http://www.newstatesman.com/feeds/topics/world-affairs.rss'), (u'Columns - Martin Bright', u'http://www.newstatesman.com/feeds/writers/martin_bright.rss'), (u'Columns - Kira Cochrane', u'http://www.newstatesman.com/feeds/writers/kira_cochrane.rss'), (u'Columns - Hunter Davies', u'http://www.newstatesman.com/feeds/topics/world-affairs.rss'), (u'Columns - Noreena Hertz', u'http://www.newstatesman.com/feeds/writers/noreena_hertz.rss'), (u'Columns - Lindsey Hilsum', u'http://www.newstatesman.com/feeds/writers/lindsey_hilsum.rss'), (u'Columns - Darcus Howe', u'http://www.newstatesman.com/feeds/writers/darcus_howe.rss'), (u'Columns - Emma John', u'http://www.newstatesman.com/feeds/writers/emma_john.rss'), (u'Columns - Sadakat Kadri', u'http://www.newstatesman.com/feeds/writers/sadakat_kadri.rss'), (u'Columns - Mark Lynas', u'http://www.newstatesman.com/feeds/writers/mark_lynas.rss'), (u'Columns - Kevin Maguire', u'http://www.newstatesman.com/feeds/writers/kevin_maguire.rss'), (u'Columns - Rageh Omaar', u'http://www.newstatesman.com/feeds/writers/rageh_omaar.rss'), (u'Columns - John Pilger', u'http://www.newstatesman.com/feeds/writers/john_pilger.rss'), (u'Columns - Ziauddin Sardar', u'http://www.newstatesman.com/feeds/writers/ziauddin_sardar.rss'), (u'Columns - Clive Stafford-Smith', u'http://www.newstatesman.com/feeds/writers/clive_stafford_smith.rss'), (u'Columns - Michela Wrong', u'http://www.newstatesman.com/feeds/writers/michela_wrong.rss')
            ]

    def preprocess_html(self, soup):
        for item in soup.findAll(style=True):
            del item['style']
        mtag = '\n<meta http-equiv="Content-Language" content="en"/>\n'
        soup.head.insert(0,mtag)
        return soup
tbaac is offline  
Old 02-09-2009, 11:29 AM   #185
XanthanGum
Connoisseur
XanthanGum began at the beginning.
 
XanthanGum's Avatar
 
Posts: 51
Karma: 10
Join Date: Dec 2008
Location: Germany
Device: SONY PRS-500
How To Fine Tune Recipes

Quote:
Originally Posted by kovidgoyal View Post
I'm guessing the documentation is insufficient for what you need?
Kovid,

I did look at the FAQ and the samples provided there a while back. But I think the New York Times example was a bit too complex for me, at least at the time. I will go back, though, and study the examples in more depth. I also plan to print out more of the recipes to compare them to one another and the associated Web sites to try to figure out what each is doing.

I guess what I need to know is:

- When you guys come up with a well-working recipe for a site such as the New York Times or New Statesman, are you looking at the source HTML code from the site? How do you know what tags to remove, for example?

- How do you fetch an entire article from a news site? What code segment does that? For example, I downloaded Ars Technica today to read while at lunch. While reading the Ars Technica articles, I noticed that only a summary for each article is presented. You're told to click on a link to read the rest. I'd like to edit the recipe to see if I could get the rest of those articles. What code in Darko Miletic's New Statesman recipe forces the fetching of entire articles? Would the same code solve the Ars Technica problem or would it have to be changed in some way?

Instead of a workshop, would you or Darko (?) have time to answer such questions as mine above? I understand object-oriented programming languages like Java and C++, and know several of the older procedural languages, so I think I could grasp what I need to know to write more recipes if given some of the basics.

Thanks...

Xanthan Gum
XanthanGum is offline  
Advert
Old 02-09-2009, 12:51 PM   #186
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,397
Karma: 27756918
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Yes tags to remove are deduced from the source HTML

The simplest way to get the full text of the articles is if the website has a "Print version". If it does, you need to figure out how to map the URLs in the RSS feeds to the corresponding print version. Then encode that logic into the print_version method which takes a url and should return the print version of the URL.
kovidgoyal is online now  
Old 02-09-2009, 01:03 PM   #187
kiklop74
Guru
kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.
 
kiklop74's Avatar
 
Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
Quote:
Originally Posted by XanthanGum View Post
I guess what I need to know is:

- When you guys come up with a well-working recipe for a site such as the New York Times or New Statesman, are you looking at the source HTML code from the site?
Yes. The best way to browse quickly html is to get firefox and firebug plugin.

Quote:
Originally Posted by XanthanGum View Post
How do you know what tags to remove, for example?
That is something you get with the time.

Quote:
Originally Posted by XanthanGum View Post
- How do you fetch an entire article from a news site? What code segment does that?
Setting use_embedded_content to False does this.

Code:
use_embedded_content  = False
Quote:
Originally Posted by XanthanGum View Post
Would the same code solve the Ars Technica problem or would it have to be changed in some way?
Yes it would.

What you need to read is actually documentation of the BasicNewsRecipe and see for yourself the actual code which is in general well comented.

The rest you can deduce from the multitude of existing recipes. You should start with more simple one's. The New York times is one of the more complex and it is not recommended for the beginners.
kiklop74 is offline  
Old 02-10-2009, 12:34 PM   #188
kiklop74
Guru
kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.
 
kiklop74's Avatar
 
Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
New recipe for Montenegro newspaper "Pobjeda" (in Serbian)

Supports both LRF and EPUB format.
Attached Files
File Type: zip pobjeda.zip (1.7 KB, 424 views)
kiklop74 is offline  
Old 02-10-2009, 07:20 PM   #189
malkie13
Member
malkie13 began at the beginning.
 
Posts: 14
Karma: 10
Join Date: Feb 2009
Device: PRS-505
Request: Fanfiction.net

No clue how I'd go about making this work.

Currently I use the online version of FLAG (Fanfiction.net Lightweight Automated Grabber) from http://flag.erayd.net/ to grab Stories (multiple chapters at a go) from Fanfiction.net and them manually importing them into Calibre.

https://www.mobileread.com/forums/showthread.php?t=26055 has info and downloads on the FLAG program.

What would be ideal, however, would be a custom recipe, based on FLAG that would have an input for the Story ID that could then go about fetching the whole thing (as the stories are split across multiple "chapters" across several pages). Unfortunately, I can't code my way out of a paper sack, and haven't the foggiest idea how to do this sort of thing.
malkie13 is offline  
Old 02-10-2009, 08:08 PM   #190
kiklop74
Guru
kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.
 
kiklop74's Avatar
 
Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
I noticed one minor error in new release of calibre. Recipe "Politika Online" should also go to the serbian language category.
kiklop74 is offline  
Old 02-10-2009, 09:19 PM   #191
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,397
Karma: 27756918
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Quote:
Originally Posted by kiklop74 View Post
I noticed one minor error in new release of calibre. Recipe "Politika Online" should also go to the serbian language category.
Fixed.
kovidgoyal is online now  
Old 02-11-2009, 01:40 PM   #192
XanthanGum
Connoisseur
XanthanGum began at the beginning.
 
XanthanGum's Avatar
 
Posts: 51
Karma: 10
Join Date: Dec 2008
Location: Germany
Device: SONY PRS-500
Print Versions

Quote:
Originally Posted by kovidgoyal View Post
Yes tags to remove are deduced from the source HTML

The simplest way to get the full text of the articles is if the website has a "Print version". If it does, you need to figure out how to map the URLs in the RSS feeds to the corresponding print version. Then encode that logic into the print_version method which takes a url and should return the print version of the URL.
Kovid,

I understand how that works. I remember seeing the BBC example in the FAQ or tutorial. It made sense.

But many sites, like Ars Technica, don't offer that print option; you're forced to advance to the next page to read the rest of the article (when reading with a browser).

I tried kipklop74's suggestion by inserting the line:

use_embedded_content = False

in the recipe. But...it doesn't fetch the rest of the Ars Technica articles.

Any suggestions? (Kovid, Darko)

Xanthan Gum
XanthanGum is offline  
Old 02-11-2009, 01:47 PM   #193
XanthanGum
Connoisseur
XanthanGum began at the beginning.
 
XanthanGum's Avatar
 
Posts: 51
Karma: 10
Join Date: Dec 2008
Location: Germany
Device: SONY PRS-500
use_embedded_content = False

Quote:
Originally Posted by kiklop74 View Post
Yes. The best way to browse quickly html is to get firefox and firebug plugin.



That is something you get with the time.



Setting use_embedded_content to False does this.

Code:
use_embedded_content  = False


Yes it would.

What you need to read is actually documentation of the BasicNewsRecipe and see for yourself the actual code which is in general well comented.

The rest you can deduce from the multitude of existing recipes. You should start with more simple one's. The New York times is one of the more complex and it is not recommended for the beginners.

kiklop74,

Thanks for responding (you and Kovid). Firefox is the browser I use most times. I use Opera for some browsing. I don't think I have the firebug plugin installed so will get that.

When you state "Yes it would.", do you mean that the one line:

Code:
use_embedded_content  = False
will do the trick in the Ars Technica recipe or do you mean that something extra would have to be added with that line of code.

As I posted up above in response to Kovid's remarks about the print option, using just the

Code:
use_embedded_content  = False
line made no difference in the Ars Technica recipe.

I will, for sure, look over the documentation for the BasicNewsRecipe and print out a number of the recipes for comparison.

Xanthan Gum
XanthanGum is offline  
Old 02-11-2009, 02:22 PM   #194
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,397
Karma: 27756918
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Quote:
Originally Posted by XanthanGum View Post
Kovid,

I understand how that works. I remember seeing the BBC example in the FAQ or tutorial. It made sense.

But many sites, like Ars Technica, don't offer that print option; you're forced to advance to the next page to read the rest of the article (when reading with a browser).

I tried kipklop74's suggestion by inserting the line:

use_embedded_content = False

in the recipe. But...it doesn't fetch the rest of the Ars Technica articles.

Any suggestions? (Kovid, Darko)

Xanthan Gum

Look at the Newsweek recipe it does this. i.e. it follows the next links
kovidgoyal is online now  
Old 02-11-2009, 05:36 PM   #195
kiklop74
Guru
kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.
 
kiklop74's Avatar
 
Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
The original Ars Technica recipe did have a problem with article length. Here is completely rewritten recipe that works well. Tested with both LRF and EPUB.
Attached Files
File Type: zip ars_technica.zip (1.1 KB, 453 views)
kiklop74 is offline  
Closed Thread


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Custom column read ? pchrist7 Calibre 2 10-04-2010 02:52 AM
Archive for custom screensavers sleeplessdave Amazon Kindle 1 07-07-2010 12:33 PM
How to back up preferences and custom recipes? greenapple Calibre 3 03-29-2010 05:08 AM
Donations for Custom Recipes ddavtian Calibre 5 01-23-2010 04:54 PM
Help understanding custom recipes andersent Calibre 0 12-17-2009 02:37 PM


All times are GMT -4. The time now is 11:45 PM.


MobileRead.com is a privately owned, operated and funded community.