Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Closed Thread
 
Thread Tools Search this Thread
Old 01-07-2010, 10:19 PM   #1096
Krittika Goyal
Vox calibre
Krittika Goyal ought to be getting tired of karma fortunes by now.Krittika Goyal ought to be getting tired of karma fortunes by now.Krittika Goyal ought to be getting tired of karma fortunes by now.Krittika Goyal ought to be getting tired of karma fortunes by now.Krittika Goyal ought to be getting tired of karma fortunes by now.Krittika Goyal ought to be getting tired of karma fortunes by now.Krittika Goyal ought to be getting tired of karma fortunes by now.Krittika Goyal ought to be getting tired of karma fortunes by now.Krittika Goyal ought to be getting tired of karma fortunes by now.Krittika Goyal ought to be getting tired of karma fortunes by now.Krittika Goyal ought to be getting tired of karma fortunes by now.
 
Krittika Goyal's Avatar
 
Posts: 412
Karma: 1175230
Join Date: Jan 2009
Device: Sony reader prs700, kobo
@lorenzov
Kovid created a wiki page
http://bugs.calibre-ebook.com/wiki/RecipeTips
that can be used to provide useful tips for recipes. right now its is almost empty. I would like to help you make this page.

Last edited by kovidgoyal; 01-07-2010 at 10:33 PM.
Krittika Goyal is offline  
Old 01-08-2010, 06:48 AM   #1097
wdrwc
Junior Member
wdrwc began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Jan 2010
Device: htc hero
can't fetch urls from feed in ebook-convert

I try to prepare a recipe for the gazeta.pl. I am testing it on one of their feeds:
http://serwisy.gazeta.pl/pub/rss/fb-technologie.xml

I prepared very simple custom recipe which should use printable version of the articles. However when I test the recipe with ebook-convert, articles are not dowloaded. ebook-convert reports it can not fetch articles, but the urls generated in the print_version() open without any problem in the browser.

Here is the part of the report from running ebook-convert --vv:
Code:
Downloading
Fetching http://technologie.gazeta.pl/technologie/2029020,82008,7432357.html
Could not fetch link http://technologie.gazeta.pl/technologie/2029020,82008,7432357.html
Traceback (most recent call last):
  File "site-packages\calibre\web\fetch\simple.py", line 401, in process_links
  File "site-packages\calibre\web\fetch\simple.py", line 208, in fetch_url
FetchError: Not Found

http://technologie.gazeta.pl/technologie/2029020,82008,7432357.html saved to 
Downloading
Fetching http://technologie.gazeta.pl/technologie/2029020,82008,7432282.html
Failed to download article: Korzystasz z Windows i Adobe Readera? Szykuj si� na �atanie... from http://technologie.gazeta.pl/technologie/1,82008,7432357,Korzystasz_z_Windows_i_Adobe_Readera__Szykuj_sie_na.html
Traceback (most recent call last):
  File "site-packages\calibre\utils\threadpool.py", line 95, in run
  File "site-packages\calibre\web\feeds\news.py", line 703, in fetch_article
  File "site-packages\calibre\web\feeds\news.py", line 699, in _fetch_article
Exception: Could not fetch article. Run with -vv to see the reason



2% Article download failed: u'Korzystasz z Windows i Adobe Readera? Szykuj si\u0119 na \u0142atanie...'
Could not fetch link http://technologie.gazeta.pl/technologie/2029020,82008,7432282.html
Traceback (most recent call last):
  File "site-packages\calibre\web\fetch\simple.py", line 401, in process_links
  File "site-packages\calibre\web\fetch\simple.py", line 208, in fetch_url
FetchError: Not Found

http://technologie.gazeta.pl/technologie/2029020,82008,7432282.html saved to
And here is the recipe:
Code:
#!/usr/bin/env  python
'''
technologie.gazeta.pl
'''
from calibre.web.feeds.news import BasicNewsRecipe
class TechnologieGazeta(BasicNewsRecipe):
    title          = u'TechnologieGazeta'
    description    = 'Wiadomości z technologie.gazeta.pl'
    language = 'en'

    language = 'pl'
    encoding = 'iso-8859-2'
    no_stylesheets = True
    remove_javascript = True
    max_articles_per_feed = 50
    simultaneous_downloads = 1

    feeds          = [
                      ('Wiadomosci Technologie gazeta.pl', 'http://serwisy.gazeta.pl/pub/rss/fb-technologie.xml'),
                    ]

    def print_version(self, url):
        start, sep, rest = url.rpartition('/')
        numbers, sep, tytul = rest.rpartition(',')
        printversion = numbers.replace('1,','2029020,',1)
        print( numbers,'  ',printversion)
        return start + '/' + printversion + '.html'
I would appreciate any help or suggestion.

Thanks,
wdrwc
wdrwc is offline  
Advert
Old 01-08-2010, 09:22 AM   #1098
cypherslock
Groupie
cypherslock is a glorious beacon of lightcypherslock is a glorious beacon of lightcypherslock is a glorious beacon of lightcypherslock is a glorious beacon of lightcypherslock is a glorious beacon of lightcypherslock is a glorious beacon of lightcypherslock is a glorious beacon of lightcypherslock is a glorious beacon of lightcypherslock is a glorious beacon of lightcypherslock is a glorious beacon of lightcypherslock is a glorious beacon of light
 
cypherslock's Avatar
 
Posts: 178
Karma: 12392
Join Date: Nov 2009
Location: Canada
Device: Kobo Vox
I know I can subscribe to it via amazon, but as it is just the website content anyway, a custom recipe for Escapist Magazine would be awesome (http://www.escapistmagazine.com/).
cypherslock is offline  
Old 01-08-2010, 12:46 PM   #1099
nickredding
onlinenewsreader.net
nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'
 
Posts: 327
Karma: 10143
Join Date: Dec 2009
Location: Phoenix, AZ & Victoria, BC
Device: Kindle 3, Kindle Fire, IPad3, iPhone4, Playbook, HTC Inspire
I'm writing a recipe to get the free parts of the Wall Street Journal. I'm getting "article download failed" for every article url, even though I can get to all of the urls in a browser. The urls all look like http://online.wsj.com/article/SB1000...n_AboveLEFTTop. Does anyone know why Calibre would be unable to download these pages?
nickredding is offline  
Old 01-08-2010, 02:48 PM   #1100
evanmaastrigt
Connoisseur
evanmaastrigt doesn't litterevanmaastrigt doesn't litter
 
Posts: 78
Karma: 192
Join Date: Nov 2009
Device: Sony PRS-600
Quote:
Originally Posted by wdrwc View Post
I try to prepare a recipe for the gazeta.pl. I am testing it on one of their feeds:
http://serwisy.gazeta.pl/pub/rss/fb-technologie.xml

I prepared very simple custom recipe which should use printable version of the articles...
Their print version is hard to get at, but I think it can be done (calibre knows some nice tricks too).

But the easy strategy is to forget the print version and just use the article from the feed. Their HTML seems to be valid, so you could use the keep_only_tags and remove_tags properties to get rid of unwanted content. There is also the preprocess_html() method to refine the result even further.

If you have further questions feel free to post them.
evanmaastrigt is offline  
Advert
Old 01-08-2010, 11:44 PM   #1101
bamasteve
Junior Member
bamasteve began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Jan 2010
Device: nook
Lorenzo,

Thanks so much for the recepie. Very nice of you. Look forward to learning how to write my own. My new nook should arrive in a couple of week....they are backordered.
bamasteve is offline  
Old 01-09-2010, 01:47 AM   #1102
Krittika Goyal
Vox calibre
Krittika Goyal ought to be getting tired of karma fortunes by now.Krittika Goyal ought to be getting tired of karma fortunes by now.Krittika Goyal ought to be getting tired of karma fortunes by now.Krittika Goyal ought to be getting tired of karma fortunes by now.Krittika Goyal ought to be getting tired of karma fortunes by now.Krittika Goyal ought to be getting tired of karma fortunes by now.Krittika Goyal ought to be getting tired of karma fortunes by now.Krittika Goyal ought to be getting tired of karma fortunes by now.Krittika Goyal ought to be getting tired of karma fortunes by now.Krittika Goyal ought to be getting tired of karma fortunes by now.Krittika Goyal ought to be getting tired of karma fortunes by now.
 
Krittika Goyal's Avatar
 
Posts: 412
Karma: 1175230
Join Date: Jan 2009
Device: Sony reader prs700, kobo
Quote:
Originally Posted by nickredding View Post
I'm writing a recipe to get the free parts of the Wall Street Journal. I'm getting "article download failed" for every article url, even though I can get to all of the urls in a browser. The urls all look like http://online.wsj.com/article/SB1000...n_AboveLEFTTop. Does anyone know why Calibre would be unable to download these pages?
If you send me your recipe I can take a look at it and see if i can figure something out.
Krittika Goyal is offline  
Old 01-09-2010, 11:04 AM   #1103
lorenzov
Member
lorenzov began at the beginning.
 
lorenzov's Avatar
 
Posts: 23
Karma: 12
Join Date: Jan 2010
Location: Edinburgh, UK
Device: SONY PRS600, Apple iPhone 3G
the escapist v1

try the attached one; obviously i have not included the videos and the forum posts, but as i was playing around with the fetching of various print versions of the feeds, it should do the job!


a questions for the experts in the forum:

is it possible to avoid repetition of articles? sometimes in different feeds (especially from blogs) it is possible to find duplicate articles. i'm trying to figure out if it is possible to prune duplicates after the fetch process

thanks!

lorenzo
Attached Files
File Type: zip theEscapistMag.zip (985 Bytes, 198 views)
lorenzov is offline  
Old 01-09-2010, 12:02 PM   #1104
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,374
Karma: 27230406
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
@lorenzov: Not easily, the reason I haven't implemented it is that its usually a god idea to leave the duplicates in there, as a user might only read a single section
kovidgoyal is offline  
Old 01-09-2010, 12:40 PM   #1105
davotibarna
Member
davotibarna began at the beginning.
 
davotibarna's Avatar
 
Posts: 17
Karma: 10
Join Date: Dec 2009
Location: Oslo, Norway
Device: Nook
new recipe: SG.hu

New Hungarian technical news recipe:

Code:
class SGhu(BasicNewsRecipe):
    title          = u'SG.hu'
    __author__     = 'davotibarna'
    description    = 'Informatika és Tudomány'
    language = 'hu'
    oldest_article = 5
    max_articles_per_feed = 100
    no_stylesheets = True
    encoding = 'ISO-8859-2'

    feeds          = [(u'SG.hu', u'http://www.sg.hu/plain/rss.xml')]

    def print_version(self, url):
        return url.replace('cikkek/', 'printer.php?cid=')
davotibarna is offline  
Old 01-09-2010, 06:00 PM   #1106
rjack
Junior Member
rjack began at the beginning.
 
Posts: 6
Karma: 10
Join Date: Jan 2010
Device: Kindle 2, Windows Mobile, PC
Folks,

I'm working on a solution for Dallas Morning News...

http://www.dallasnews.com/newskiosk/...latestnews.xml

There are lots of "extra text" above and below the main article if I just include all the newsfeeds I want.

Regards,

Robert
rjack is offline  
Old 01-09-2010, 07:27 PM   #1107
Krittika Goyal
Vox calibre
Krittika Goyal ought to be getting tired of karma fortunes by now.Krittika Goyal ought to be getting tired of karma fortunes by now.Krittika Goyal ought to be getting tired of karma fortunes by now.Krittika Goyal ought to be getting tired of karma fortunes by now.Krittika Goyal ought to be getting tired of karma fortunes by now.Krittika Goyal ought to be getting tired of karma fortunes by now.Krittika Goyal ought to be getting tired of karma fortunes by now.Krittika Goyal ought to be getting tired of karma fortunes by now.Krittika Goyal ought to be getting tired of karma fortunes by now.Krittika Goyal ought to be getting tired of karma fortunes by now.Krittika Goyal ought to be getting tired of karma fortunes by now.
 
Krittika Goyal's Avatar
 
Posts: 412
Karma: 1175230
Join Date: Jan 2009
Device: Sony reader prs700, kobo
@wdrwc
Kovid looked at your recipe and says the working recipe will be included in the next calibre release
Krittika Goyal is offline  
Old 01-09-2010, 07:30 PM   #1108
Krittika Goyal
Vox calibre
Krittika Goyal ought to be getting tired of karma fortunes by now.Krittika Goyal ought to be getting tired of karma fortunes by now.Krittika Goyal ought to be getting tired of karma fortunes by now.Krittika Goyal ought to be getting tired of karma fortunes by now.Krittika Goyal ought to be getting tired of karma fortunes by now.Krittika Goyal ought to be getting tired of karma fortunes by now.Krittika Goyal ought to be getting tired of karma fortunes by now.Krittika Goyal ought to be getting tired of karma fortunes by now.Krittika Goyal ought to be getting tired of karma fortunes by now.Krittika Goyal ought to be getting tired of karma fortunes by now.Krittika Goyal ought to be getting tired of karma fortunes by now.
 
Krittika Goyal's Avatar
 
Posts: 412
Karma: 1175230
Join Date: Jan 2009
Device: Sony reader prs700, kobo
Quote:
Originally Posted by rjack View Post
Folks,

I'm working on a solution for Dallas Morning News...

http://www.dallasnews.com/newskiosk/...latestnews.xml

There are lots of "extra text" above and below the main article if I just include all the newsfeeds I want.

Regards,

Robert


The recipe willbe included in the next calibre release
Krittika Goyal is offline  
Old 01-09-2010, 08:47 PM   #1109
rjack
Junior Member
rjack began at the beginning.
 
Posts: 6
Karma: 10
Join Date: Jan 2010
Device: Kindle 2, Windows Mobile, PC
Krittika,

That is excellent...

1) I got a lot of good information by just pasting in all the newsfeeds I wanted...
2) I have reduced the amount of "garbage" by using the following tag but it takes a log time to run since I really don't know what I'm doing...

remove_tags_after = [dict(id='article_tools_bottom')]

3) I'm attaching my complete script. Maybe you can use it for your Dallas Morning News Testing..

Thanks,

Robert Jackson
Attached Files
File Type: txt dallas_test.txt (3.7 KB, 222 views)
rjack is offline  
Old 01-09-2010, 11:22 PM   #1110
Krittika Goyal
Vox calibre
Krittika Goyal ought to be getting tired of karma fortunes by now.Krittika Goyal ought to be getting tired of karma fortunes by now.Krittika Goyal ought to be getting tired of karma fortunes by now.Krittika Goyal ought to be getting tired of karma fortunes by now.Krittika Goyal ought to be getting tired of karma fortunes by now.Krittika Goyal ought to be getting tired of karma fortunes by now.Krittika Goyal ought to be getting tired of karma fortunes by now.Krittika Goyal ought to be getting tired of karma fortunes by now.Krittika Goyal ought to be getting tired of karma fortunes by now.Krittika Goyal ought to be getting tired of karma fortunes by now.Krittika Goyal ought to be getting tired of karma fortunes by now.
 
Krittika Goyal's Avatar
 
Posts: 412
Karma: 1175230
Join Date: Jan 2009
Device: Sony reader prs700, kobo
@rjack:
You are definitely on the right track. With a few more remove tags commands and a no_stylesheets command you should be fine. I am attaching a text file with the additional commands you need. Let me know if it works for you.
Attached Files
File Type: txt removetags_dallas.txt (510 Bytes, 216 views)
Krittika Goyal is offline  
Closed Thread


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Custom column read ? pchrist7 Calibre 2 10-04-2010 02:52 AM
Archive for custom screensavers sleeplessdave Amazon Kindle 1 07-07-2010 12:33 PM
How to back up preferences and custom recipes? greenapple Calibre 3 03-29-2010 05:08 AM
Donations for Custom Recipes ddavtian Calibre 5 01-23-2010 04:54 PM
Help understanding custom recipes andersent Calibre 0 12-17-2009 02:37 PM


All times are GMT -4. The time now is 06:26 PM.


MobileRead.com is a privately owned, operated and funded community.