![]() |
#181 | |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,397
Karma: 27756918
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Quote:
|
|
![]() |
![]() |
#182 |
Enthusiast
![]() ![]() ![]() ![]() Posts: 43
Karma: 376
Join Date: Jan 2009
Location: California, USA
Device: K3, KFire, iPad, iPhone
|
Just want to say thanks for the Honolulu Advertiser and Star Bulletin, they work great!
![]() Much appreciated ![]() |
![]() |
Advert | |
|
![]() |
#183 |
Member
![]() Posts: 11
Karma: 10
Join Date: Feb 2009
Device: Sony PRS505
|
Calibre looks to be a fantastic program Kovidgoyal. Thank you.
kiklop74: Thank you for the recipe for New Statesman. Unfortunately I'm having difficulties with it. Other (built in recipes) seem to work, but the python script of yours I cannot get to run. When I click "Download" to start the download, nothing happens. I tried pasting the contents of the .py file and I tried using "Load recipe from file". I see the code loaded into the edit box but it seems not to do anything. Any idea what I might be doing wrong? Thank you. Edit: Having read in the manual in the "Tips for developing new recipes" section, I tried running each of the recommended commands from the command line (with the newstatesman.py filename) and it worked perfectly. So I don't quite understand why it won't work within the Calibre GUI. Hmmm. Last edited by tbaac; 02-07-2009 at 06:52 PM. Reason: Read something in the manual........ |
![]() |
![]() |
#184 |
Member
![]() Posts: 11
Karma: 10
Join Date: Feb 2009
Device: Sony PRS505
|
Okay, I'm not sure what was going wrong. I tried it from the command line, found that sometimes it seemed that it helped if I closed Calibre and reopened it. It works really well now, thank you.
I changed some feeds and ended up with this: Code:
#!/usr/bin/env python __license__ = 'GPL v3' __copyright__ = '2009, Darko Miletic <darko.miletic at gmail.com>' ''' newstatesman.com ''' class NewStatesman(BasicNewsRecipe): title = 'New Statesman' __author__ = 'Darko Miletic' description = "Britain's award-winning current affairs magazine" publisher = 'New Statesman' category = 'news, UK, World' oldest_article = 7 max_articles_per_feed = 100 no_stylesheets = True use_embedded_content = False encoding = 'cp1252' remove_javascript = True cover_url = 'http://media.starbulletin.com/designimages/spacer.gif' html2lrf_options = [ '--comment' , description , '--base-font-size', '10' , '--category' , category , '--publisher' , publisher ] html2epub_options = 'publisher="' + publisher + '"\ncomments="' + description + '"\ntags="' + category + '"' keep_only_tags = [dict(name='div', attrs={'class':'content-main'})] remove_tags = [ dict(name=['object','link','form','ul']) ,dict(name='ul', attrs={'class':'post-article'}) ,dict(name='div' , attrs={'class':['tag-nav-container','article-base']}) ,dict(name='div' , attrs={'id':['reader-comments']}) ] feeds = [ (u'Politics', u'http://www.newstatesman.com/feeds/topics/politics.rss'), (u'Arts & Culture', u'http://www.newstatesman.com/feeds/topics/arts-and-culture.rss'), (u'Books', u'http://www.newstatesman.com/feeds/topics/books.rss'), (u'Life & Society', u'http://www.newstatesman.com/feeds/topics/life-and-society.rss'), (u'World Affairs', u'http://www.newstatesman.com/feeds/topics/world-affairs.rss'), (u'Columns - Martin Bright', u'http://www.newstatesman.com/feeds/writers/martin_bright.rss'), (u'Columns - Kira Cochrane', u'http://www.newstatesman.com/feeds/writers/kira_cochrane.rss'), (u'Columns - Hunter Davies', u'http://www.newstatesman.com/feeds/topics/world-affairs.rss'), (u'Columns - Noreena Hertz', u'http://www.newstatesman.com/feeds/writers/noreena_hertz.rss'), (u'Columns - Lindsey Hilsum', u'http://www.newstatesman.com/feeds/writers/lindsey_hilsum.rss'), (u'Columns - Darcus Howe', u'http://www.newstatesman.com/feeds/writers/darcus_howe.rss'), (u'Columns - Emma John', u'http://www.newstatesman.com/feeds/writers/emma_john.rss'), (u'Columns - Sadakat Kadri', u'http://www.newstatesman.com/feeds/writers/sadakat_kadri.rss'), (u'Columns - Mark Lynas', u'http://www.newstatesman.com/feeds/writers/mark_lynas.rss'), (u'Columns - Kevin Maguire', u'http://www.newstatesman.com/feeds/writers/kevin_maguire.rss'), (u'Columns - Rageh Omaar', u'http://www.newstatesman.com/feeds/writers/rageh_omaar.rss'), (u'Columns - John Pilger', u'http://www.newstatesman.com/feeds/writers/john_pilger.rss'), (u'Columns - Ziauddin Sardar', u'http://www.newstatesman.com/feeds/writers/ziauddin_sardar.rss'), (u'Columns - Clive Stafford-Smith', u'http://www.newstatesman.com/feeds/writers/clive_stafford_smith.rss'), (u'Columns - Michela Wrong', u'http://www.newstatesman.com/feeds/writers/michela_wrong.rss') ] def preprocess_html(self, soup): for item in soup.findAll(style=True): del item['style'] mtag = '\n<meta http-equiv="Content-Language" content="en"/>\n' soup.head.insert(0,mtag) return soup |
![]() |
![]() |
#185 | |
Connoisseur
![]() Posts: 51
Karma: 10
Join Date: Dec 2008
Location: Germany
Device: SONY PRS-500
|
How To Fine Tune Recipes
Quote:
I did look at the FAQ and the samples provided there a while back. But I think the New York Times example was a bit too complex for me, at least at the time. I will go back, though, and study the examples in more depth. I also plan to print out more of the recipes to compare them to one another and the associated Web sites to try to figure out what each is doing. I guess what I need to know is: - When you guys come up with a well-working recipe for a site such as the New York Times or New Statesman, are you looking at the source HTML code from the site? How do you know what tags to remove, for example? - How do you fetch an entire article from a news site? What code segment does that? For example, I downloaded Ars Technica today to read while at lunch. While reading the Ars Technica articles, I noticed that only a summary for each article is presented. You're told to click on a link to read the rest. I'd like to edit the recipe to see if I could get the rest of those articles. What code in Darko Miletic's New Statesman recipe forces the fetching of entire articles? Would the same code solve the Ars Technica problem or would it have to be changed in some way? Instead of a workshop, would you or Darko (?) have time to answer such questions as mine above? I understand object-oriented programming languages like Java and C++, and know several of the older procedural languages, so I think I could grasp what I need to know to write more recipes if given some of the basics. Thanks... Xanthan Gum |
|
![]() |
Advert | |
|
![]() |
#186 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,397
Karma: 27756918
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Yes tags to remove are deduced from the source HTML
The simplest way to get the full text of the articles is if the website has a "Print version". If it does, you need to figure out how to map the URLs in the RSS feeds to the corresponding print version. Then encode that logic into the print_version method which takes a url and should return the print version of the URL. |
![]() |
![]() |
#187 | |||
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
|
Quote:
That is something you get with the time. Quote:
Code:
use_embedded_content = False Quote:
What you need to read is actually documentation of the BasicNewsRecipe and see for yourself the actual code which is in general well comented. The rest you can deduce from the multitude of existing recipes. You should start with more simple one's. The New York times is one of the more complex and it is not recommended for the beginners. |
|||
![]() |
![]() |
#188 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
|
New recipe for Montenegro newspaper "Pobjeda" (in Serbian)
Supports both LRF and EPUB format. |
![]() |
![]() |
#189 |
Member
![]() Posts: 14
Karma: 10
Join Date: Feb 2009
Device: PRS-505
|
Request: Fanfiction.net
No clue how I'd go about making this work.
Currently I use the online version of FLAG (Fanfiction.net Lightweight Automated Grabber) from http://flag.erayd.net/ to grab Stories (multiple chapters at a go) from Fanfiction.net and them manually importing them into Calibre. https://www.mobileread.com/forums/showthread.php?t=26055 has info and downloads on the FLAG program. What would be ideal, however, would be a custom recipe, based on FLAG that would have an input for the Story ID that could then go about fetching the whole thing (as the stories are split across multiple "chapters" across several pages). Unfortunately, I can't code my way out of a paper sack, and haven't the foggiest idea how to do this sort of thing. |
![]() |
![]() |
#190 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
|
I noticed one minor error in new release of calibre. Recipe "Politika Online" should also go to the serbian language category.
|
![]() |
![]() |
#191 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,397
Karma: 27756918
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
|
![]() |
![]() |
#192 | |
Connoisseur
![]() Posts: 51
Karma: 10
Join Date: Dec 2008
Location: Germany
Device: SONY PRS-500
|
Print Versions
Quote:
I understand how that works. I remember seeing the BBC example in the FAQ or tutorial. It made sense. But many sites, like Ars Technica, don't offer that print option; you're forced to advance to the next page to read the rest of the article (when reading with a browser). I tried kipklop74's suggestion by inserting the line: use_embedded_content = False in the recipe. But...it doesn't fetch the rest of the Ars Technica articles. Any suggestions? (Kovid, Darko) Xanthan Gum |
|
![]() |
![]() |
#193 | |
Connoisseur
![]() Posts: 51
Karma: 10
Join Date: Dec 2008
Location: Germany
Device: SONY PRS-500
|
use_embedded_content = False
Quote:
kiklop74, Thanks for responding (you and Kovid). Firefox is the browser I use most times. I use Opera for some browsing. I don't think I have the firebug plugin installed so will get that. When you state "Yes it would.", do you mean that the one line: Code:
use_embedded_content = False As I posted up above in response to Kovid's remarks about the print option, using just the Code:
use_embedded_content = False I will, for sure, look over the documentation for the BasicNewsRecipe and print out a number of the recipes for comparison. Xanthan Gum |
|
![]() |
![]() |
#194 | |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,397
Karma: 27756918
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Quote:
Look at the Newsweek recipe it does this. i.e. it follows the next links |
|
![]() |
![]() |
#195 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
|
The original Ars Technica recipe did have a problem with article length. Here is completely rewritten recipe that works well. Tested with both LRF and EPUB.
|
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Custom column read ? | pchrist7 | Calibre | 2 | 10-04-2010 02:52 AM |
Archive for custom screensavers | sleeplessdave | Amazon Kindle | 1 | 07-07-2010 12:33 PM |
How to back up preferences and custom recipes? | greenapple | Calibre | 3 | 03-29-2010 05:08 AM |
Donations for Custom Recipes | ddavtian | Calibre | 5 | 01-23-2010 04:54 PM |
Help understanding custom recipes | andersent | Calibre | 0 | 12-17-2009 02:37 PM |