![]() |
#1096 |
Vox calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 412
Karma: 1175230
Join Date: Jan 2009
Device: Sony reader prs700, kobo
|
@lorenzov
Kovid created a wiki page http://bugs.calibre-ebook.com/wiki/RecipeTips that can be used to provide useful tips for recipes. right now its is almost empty. I would like to help you make this page. Last edited by kovidgoyal; 01-07-2010 at 10:33 PM. |
![]() |
![]() |
#1097 |
Junior Member
![]() Posts: 2
Karma: 10
Join Date: Jan 2010
Device: htc hero
|
can't fetch urls from feed in ebook-convert
I try to prepare a recipe for the gazeta.pl. I am testing it on one of their feeds:
http://serwisy.gazeta.pl/pub/rss/fb-technologie.xml I prepared very simple custom recipe which should use printable version of the articles. However when I test the recipe with ebook-convert, articles are not dowloaded. ebook-convert reports it can not fetch articles, but the urls generated in the print_version() open without any problem in the browser. Here is the part of the report from running ebook-convert --vv: Code:
Downloading Fetching http://technologie.gazeta.pl/technologie/2029020,82008,7432357.html Could not fetch link http://technologie.gazeta.pl/technologie/2029020,82008,7432357.html Traceback (most recent call last): File "site-packages\calibre\web\fetch\simple.py", line 401, in process_links File "site-packages\calibre\web\fetch\simple.py", line 208, in fetch_url FetchError: Not Found http://technologie.gazeta.pl/technologie/2029020,82008,7432357.html saved to Downloading Fetching http://technologie.gazeta.pl/technologie/2029020,82008,7432282.html Failed to download article: Korzystasz z Windows i Adobe Readera? Szykuj si� na �atanie... from http://technologie.gazeta.pl/technologie/1,82008,7432357,Korzystasz_z_Windows_i_Adobe_Readera__Szykuj_sie_na.html Traceback (most recent call last): File "site-packages\calibre\utils\threadpool.py", line 95, in run File "site-packages\calibre\web\feeds\news.py", line 703, in fetch_article File "site-packages\calibre\web\feeds\news.py", line 699, in _fetch_article Exception: Could not fetch article. Run with -vv to see the reason 2% Article download failed: u'Korzystasz z Windows i Adobe Readera? Szykuj si\u0119 na \u0142atanie...' Could not fetch link http://technologie.gazeta.pl/technologie/2029020,82008,7432282.html Traceback (most recent call last): File "site-packages\calibre\web\fetch\simple.py", line 401, in process_links File "site-packages\calibre\web\fetch\simple.py", line 208, in fetch_url FetchError: Not Found http://technologie.gazeta.pl/technologie/2029020,82008,7432282.html saved to Code:
#!/usr/bin/env python ''' technologie.gazeta.pl ''' from calibre.web.feeds.news import BasicNewsRecipe class TechnologieGazeta(BasicNewsRecipe): title = u'TechnologieGazeta' description = 'Wiadomości z technologie.gazeta.pl' language = 'en' language = 'pl' encoding = 'iso-8859-2' no_stylesheets = True remove_javascript = True max_articles_per_feed = 50 simultaneous_downloads = 1 feeds = [ ('Wiadomosci Technologie gazeta.pl', 'http://serwisy.gazeta.pl/pub/rss/fb-technologie.xml'), ] def print_version(self, url): start, sep, rest = url.rpartition('/') numbers, sep, tytul = rest.rpartition(',') printversion = numbers.replace('1,','2029020,',1) print( numbers,' ',printversion) return start + '/' + printversion + '.html' Thanks, wdrwc |
![]() |
Advert | |
|
![]() |
#1098 |
Groupie
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 178
Karma: 12392
Join Date: Nov 2009
Location: Canada
Device: Kobo Vox
|
I know I can subscribe to it via amazon, but as it is just the website content anyway, a custom recipe for Escapist Magazine would be awesome (http://www.escapistmagazine.com/).
|
![]() |
![]() |
#1099 |
onlinenewsreader.net
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 327
Karma: 10143
Join Date: Dec 2009
Location: Phoenix, AZ & Victoria, BC
Device: Kindle 3, Kindle Fire, IPad3, iPhone4, Playbook, HTC Inspire
|
I'm writing a recipe to get the free parts of the Wall Street Journal. I'm getting "article download failed" for every article url, even though I can get to all of the urls in a browser. The urls all look like http://online.wsj.com/article/SB1000...n_AboveLEFTTop. Does anyone know why Calibre would be unable to download these pages?
|
![]() |
![]() |
#1100 | |
Connoisseur
![]() ![]() Posts: 78
Karma: 192
Join Date: Nov 2009
Device: Sony PRS-600
|
Quote:
But the easy strategy is to forget the print version and just use the article from the feed. Their HTML seems to be valid, so you could use the keep_only_tags and remove_tags properties to get rid of unwanted content. There is also the preprocess_html() method to refine the result even further. If you have further questions feel free to post them. |
|
![]() |
Advert | |
|
![]() |
#1101 |
Junior Member
![]() Posts: 2
Karma: 10
Join Date: Jan 2010
Device: nook
|
Lorenzo,
Thanks so much for the recepie. Very nice of you. Look forward to learning how to write my own. My new nook should arrive in a couple of week....they are backordered. |
![]() |
![]() |
#1102 | |
Vox calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 412
Karma: 1175230
Join Date: Jan 2009
Device: Sony reader prs700, kobo
|
Quote:
|
|
![]() |
![]() |
#1103 |
Member
![]() Posts: 23
Karma: 12
Join Date: Jan 2010
Location: Edinburgh, UK
Device: SONY PRS600, Apple iPhone 3G
|
the escapist v1
try the attached one; obviously i have not included the videos and the forum posts, but as i was playing around with the fetching of various print versions of the feeds, it should do the job!
a questions for the experts in the forum: is it possible to avoid repetition of articles? sometimes in different feeds (especially from blogs) it is possible to find duplicate articles. i'm trying to figure out if it is possible to prune duplicates after the fetch process thanks! lorenzo |
![]() |
![]() |
#1104 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,374
Karma: 27230406
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
@lorenzov: Not easily, the reason I haven't implemented it is that its usually a god idea to leave the duplicates in there, as a user might only read a single section
|
![]() |
![]() |
#1105 |
Member
![]() Posts: 17
Karma: 10
Join Date: Dec 2009
Location: Oslo, Norway
Device: Nook
|
new recipe: SG.hu
New Hungarian technical news recipe:
Code:
class SGhu(BasicNewsRecipe): title = u'SG.hu' __author__ = 'davotibarna' description = 'Informatika és Tudomány' language = 'hu' oldest_article = 5 max_articles_per_feed = 100 no_stylesheets = True encoding = 'ISO-8859-2' feeds = [(u'SG.hu', u'http://www.sg.hu/plain/rss.xml')] def print_version(self, url): return url.replace('cikkek/', 'printer.php?cid=') |
![]() |
![]() |
#1106 |
Junior Member
![]() Posts: 6
Karma: 10
Join Date: Jan 2010
Device: Kindle 2, Windows Mobile, PC
|
Folks,
I'm working on a solution for Dallas Morning News... http://www.dallasnews.com/newskiosk/...latestnews.xml There are lots of "extra text" above and below the main article if I just include all the newsfeeds I want. Regards, Robert |
![]() |
![]() |
#1107 |
Vox calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 412
Karma: 1175230
Join Date: Jan 2009
Device: Sony reader prs700, kobo
|
@wdrwc
Kovid looked at your recipe and says the working recipe will be included in the next calibre release |
![]() |
![]() |
#1108 | |
Vox calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 412
Karma: 1175230
Join Date: Jan 2009
Device: Sony reader prs700, kobo
|
Quote:
The recipe willbe included in the next calibre release |
|
![]() |
![]() |
#1109 |
Junior Member
![]() Posts: 6
Karma: 10
Join Date: Jan 2010
Device: Kindle 2, Windows Mobile, PC
|
Krittika,
That is excellent... 1) I got a lot of good information by just pasting in all the newsfeeds I wanted... 2) I have reduced the amount of "garbage" by using the following tag but it takes a log time to run since I really don't know what I'm doing... remove_tags_after = [dict(id='article_tools_bottom')] 3) I'm attaching my complete script. Maybe you can use it for your Dallas Morning News Testing.. Thanks, Robert Jackson |
![]() |
![]() |
#1110 |
Vox calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 412
Karma: 1175230
Join Date: Jan 2009
Device: Sony reader prs700, kobo
|
@rjack:
You are definitely on the right track. With a few more remove tags commands and a no_stylesheets command you should be fine. I am attaching a text file with the additional commands you need. Let me know if it works for you. |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Custom column read ? | pchrist7 | Calibre | 2 | 10-04-2010 02:52 AM |
Archive for custom screensavers | sleeplessdave | Amazon Kindle | 1 | 07-07-2010 12:33 PM |
How to back up preferences and custom recipes? | greenapple | Calibre | 3 | 03-29-2010 05:08 AM |
Donations for Custom Recipes | ddavtian | Calibre | 5 | 01-23-2010 04:54 PM |
Help understanding custom recipes | andersent | Calibre | 0 | 12-17-2009 02:37 PM |