Custom recipes (archive, read-only) - Page 26

kiklop74 · 03-21-2009, 08:24 PM

Quote:

Originally Posted by piflintstone

Ok, I gave up on derstandard.at but instead would like to get the RSS Feeds from "Die Presse" working.
I started with this one: http://diepresse.com/rss/home
Everything is working more or less fine (apart from the terrible formatting, but this should be rather easy to do from what I saw on the calibre manual), but one thing: The links listed there link to the main pages of the articles. I was able to strip unwanted stuff (like menus etc.) but for some reason I was not able to include the articles picture?
Some pictures do show up in the final file, but the article´s main picture will not show up, any idea why?

Nobody can help you before you show the actual code.

You should use this to extract article from that site:

Code:

    keep_only_tags = [dict(name='div', attrs={'class':'article'})]
    remove_tags_after = dict(name='div',attrs={'class':'articletext'})

piflintstone · 03-22-2009, 07:40 AM

Thats more or less what I used, the Article can also be extraced fine, the problem is the picture within the article. Its a normal JPG picture, but still, it fails to be included. Tried Bookit to get the whole page but it also fails to include the articles picture.

e.g.: http://diepresse.com/home/panorama/r...ex.do?from=rss
Picture of the pope in there, nevertheless, no picture included in the final ebook.

Code:

remove_tags_before = dict(id='content')
remove_tags_after  = dict(id='content')

kiklop74 · 03-22-2009, 08:48 AM

You discovered a bug in calibre. For some reason calibre does not fetch the image inside the article. It is just being ignored. Please open bug report in calibre trac and attach your recipe to it so that Kovid can fix this.

RufusA · 03-22-2009, 07:16 PM

Quote:

Originally Posted by piflintstone

...the problem is the picture within the article. Its a normal JPG picture, but still, it fails to be included...

Whilst the picture is JPG, it's being called in an unusual way. Rather than a simple <img...> tag it appears as a background image to an anchor link.

Typically background images are used as "fluff" on a page are should be assumed as irrelevant furniture and are therefore (rightly IMHO) ignored. If the image conveys meaning it should have used a normal <img... tag with an appropriate alt attribute for accessibility.

IMHO it's the fault of diepresse not Calibre.

Rufus.

keckx · 03-23-2009, 06:14 AM

Hi,
I just made my first receipt for www.nzz.ch
It's doing all I want, but unfortunately it's so slow... ( 56min to produce a 0.3MB ebook )

I started with the BBC receipt to do this, but I don't see, why the NZZ version should be so slow.

Here's the receipt:

Code:

#!/usr/bin/env  python
'''
nzz.ch
'''

from calibre.web.feeds.news import BasicNewsRecipe

class NewNzz(BasicNewsRecipe):
    title          = u'Neue Zuericher Zeitung'
    __author__     = 'NZZ'
    description    = 'Neue Zuericher Zeitung'
    no_stylesheets = True
    language = _('German')
    keep_only_tags = [dict(name='div', attrs={'class':'article'})]
    remove_tags_before = dict(id='article')
    remove_tags_after  = dict(id='article')
    remove_tags     = [dict(attrs={'class':['more', 'nowrap', 'footer', 'teaser', 'articleTools', 'post-tools', 'side_tool', 'nextArticleLink clearfix']}),
                       dict(id=['formSendArticle', 'footer', 'toolsRight', 'articleInline', 'navigation', 'archive', 'side_search', 'blog_sidebar', 'side_tool', 'side_index']),
                       dict(name=['script', 'noscript', 'style'])]


    feeds          = [
                      ('Top Themen', 'http://www.nzz.ch/nachrichten/startseite?rss=true'),
                      ('International', 'http://www.nzz.ch/nachrichten/international?rss=true'),
                      ('Schweiz', 'http://www.nzz.ch/nachrichten/schweiz?rss=true'),
                      ('Wirtschaft', 'http://www.nzz.ch/nachrichten/wirtschaft/aktuell?rss=true'),
                      ('Zuerich', 'http://www.nzz.ch/nachrichten/zuerich?rss=true'),
                      ('Sport', 'http://www.nzz.ch/nachrichten/sport?rss=true'),
					  ('Panorama', 'http://www.nzz.ch/nachrichten/panorama?rss=true'),          
                    ]

    def print_version(self, url):
        return url+'?printview=true'

any ideas?
Best regards
keckx

kiklop74 · 03-23-2009, 07:34 AM

NZZ online server is quite slow. There is nothing you can do about that.

Just some notes about the recipe:

This:

Code:

    remove_tags_before = dict(id='article')
    remove_tags_after  = dict(id='article')

can be done better with just one line:

Code:

    keep_only_tags = [dict(name='div', attrs={'id':'article'})]

AngeloT · 03-23-2009, 02:59 PM

hello everybody...

I have a problem with some german calibre recipes and the epub-output.

The recipes for Spiegel Online and FAZ NET are not working correctly and I have no idea why...
Spiegel Online gives only about eight pages with the overview on the articles and the FAZ NET-ebook leads to a freezing of my sony prs505...

Does anybody have an idea why this happens or does anybody have the same problems?

I stopped using the LRF-Output where those recipes worked well because of the bug that causes the reader to reset. The workaround described in the FAQ (download RSS-feed using calibre and transfering via the Sony software) is not acceptible because I loose the comfort of just upping some news to my reader in the sleepy morning...

Perhaps somebody knows a solution for my problem?

Thanks !
AngeloT

kovidgoyal · 03-23-2009, 05:51 PM

Spiegel will be fixed in the next release. As for FAZ I dont see anything abviously wrong with the EPUB file, so bug SONY/Adobe to fix their software, the EPUB file certainly works correctly on the desktop

ligos · 03-24-2009, 12:35 PM

I have been using Calibre for some time and made some of my own recipes. Over a long period I have been using Calibre 0.4.67. and those recipes were working on this Calibre version. I have tried few next Calibre versions but those custom recipes were not working on those trials.

Now I have 0.5.2. Calibre version installed and those recipes do not work either. The following Conversion Error shows up:

Quote:

Job: **Fetch news from DiePresseWirtschaft**
**tuple**: ('AttributeError', u"'DiePresseWirtschaft' object has no attribute 'output_dir'")
**Traceback**:
Traceback (most recent call last):
File "parallel.py", line 958, in worker
File "parallel.py", line 916, in work
File "C:\Program Files\calibre\library.zip\calibre\ebooks\lrf\feeds \convert_from.py", line 40, in main
File "C:\Program Files\calibre\library.zip\calibre\web\feeds\main.p y", line 150, in run_recipe
AttributeError: 'DiePresseWirtschaft' object has no attribute 'output_dir'

**Log**:
('AttributeError', u"'DiePresseWirtschaft' object has no attribute 'output_dir'")
Traceback (most recent call last):
File "parallel.py", line 958, in worker
File "parallel.py", line 916, in work
File "C:\Program Files\calibre\library.zip\calibre\ebooks\lrf\feeds \convert_from.py", line 40, in main
File "C:\Program Files\calibre\library.zip\calibre\web\feeds\main.p y", line 150, in run_recipe
AttributeError: 'DiePresseWirtschaft' object has no attribute 'output_dir'

The error is the same in every recipe except for the name of the recipe (in the example above DiePresseWirtschaft).
The code of each recipe is a little bit different and it generally presents itself in this fashion:

Code:

from libprs500.ebooks.lrf.web.profiles import DefaultProfile 

import re 

class DiePresseWirtschaft(DefaultProfile): 
    title = 'DiePresseWirtschaft' 
    timefmt = ' [%d %b %Y]' 
    summary_length = 1000
    oldest_article = 1
    max_articles_per_feed = 100
    max_recursions = 2 
    html_description = True 
    no_stylesheets = True 

    def get_feeds(self):  
        return [ ('Die Presse Wirtschaft', 'http://www.diepresse.com/rss/Wirtschaft') ]  

    def print_version(self,url): 
        return url.replace('index.do?from=rss', 'print.do') 

    preprocess_regexps = [
        (re.compile(r'<script>.*?</script>', re.IGNORECASE | re.DOTALL), lambda match : ''),
        (re.compile(r'<H4>.*?</H4>', re.IGNORECASE | re.DOTALL), lambda match : ''),
        ]

Does anyone know what to do to to start my recipes working again?

kovidgoyal · 03-24-2009, 02:09 PM

http://calibre.kovidgoyal.net/user_m...les-to-recipes

shaunconn · 03-24-2009, 03:48 PM

Hi

I'm using the google reader recipe, but kovidgoyal (who is usually right!) thinks this only downloads starred messages. Is there a way to get it to load all unread messages, or do I have to manually create a recipe file for each of my 40 feeds?

TIA
Shaun

AngeloT · 03-25-2009, 09:54 AM

Quote:

Originally Posted by kovidgoyal

Spiegel will be fixed in the next release. As for FAZ I dont see anything abviously wrong with the EPUB file, so bug SONY/Adobe to fix their software, the EPUB file certainly works correctly on the desktop

Thanks for your answer - I will happily wait for the next release to enjoy my Spiegel feed

I don't know what's wrong with the FAZ Epub - looks fine on the desktop but it has very large pages with small letters and is also very broad. I think this is the reason why it freezes my Sony reader, which is perhaps not able to format these pages right.

kiklop74 · 03-25-2009, 11:25 AM

FAZ recipe was not cleaning all of the styles and that made content hard to read. Here is vastly updated recipe that produces correct epub.

@Kovid
Please update this with your upcoming release.

FAZ updated recipe:

kovidgoyal · 03-25-2009, 12:48 PM

Thanks, updated.

Hypernova · 03-27-2009, 10:14 PM

I tried to make a recipe for FanFiction.net. Well, unless it is a multiple chapters story, it is easy enough.

Code:

class FanFiction(BasicNewsRecipe):
    title          = u'FanFiction'
    oldest_article = 7
    max_articles_per_feed = 10
    use_embedded_content  = False
    remove_javascript     = True
    keep_only_tags     = [dict(name='div', attrs={'id':'storytext'})]
    
    feeds          = [(u'Just In', u'http://www.fanfiction.net/atom/j/0/0/0/')]

But I don't know what to do with multiple chapters story (example: http://www.fanfiction.net/s/4952058/1/Moonlight_Spell). I tried looking at Ars Technica recipe, but have no idea what's going on. Can I request a recipe that works for multiple chapters story as well?

03-22-2009, 07:40 AM	#377
piflintstone Enthusiast Posts: 27 Karma: 10 Join Date: Mar 2009 Device: PRS-505	Thats more or less what I used, the Article can also be extraced fine, the problem is the picture within the article. Its a normal JPG picture, but still, it fails to be included. Tried Bookit to get the whole page but it also fails to include the articles picture. e.g.: http://diepresse.com/home/panorama/r...ex.do?from=rss Picture of the pope in there, nevertheless, no picture included in the final ebook. Code: remove_tags_before = dict(id='content') remove_tags_after = dict(id='content')

03-23-2009, 07:34 AM	#381
kiklop74 Guru Posts: 800 Karma: 194644 Join Date: Dec 2007 Location: Argentina Device: Kindle Voyage	NZZ online server is quite slow. There is nothing you can do about that. Just some notes about the recipe: This: Code: remove_tags_before = dict(id='article') remove_tags_after = dict(id='article') can be done better with just one line: Code: keep_only_tags = [dict(name='div', attrs={'id':'article'})]

03-24-2009, 03:48 PM	#386
shaunconn Junior Member Posts: 7 Karma: 10 Join Date: Sep 2008 Device: Sony PRS-505	Google reader recipe - starred only? Hi I'm using the google reader recipe, but kovidgoyal (who is usually right!) thinks this only downloads starred messages. Is there a way to get it to load all unread messages, or do I have to manually create a recipe file for each of my 40 feeds? TIA Shaun

03-27-2009, 10:14 PM	#390
Hypernova Hyperreader Posts: 130 Karma: 28678 Join Date: Feb 2009 Device: Current: Boox Leaf2 (broken) Past: H2O, Kindle PW1, DXG;Pocketbook 360	FanFiction.net I tried to make a recipe for FanFiction.net. Well, unless it is a multiple chapters story, it is easy enough. Code: class FanFiction(BasicNewsRecipe): title = u'FanFiction' oldest_article = 7 max_articles_per_feed = 10 use_embedded_content = False remove_javascript = True keep_only_tags = [dict(name='div', attrs={'id':'storytext'})] feeds = [(u'Just In', u'http://www.fanfiction.net/atom/j/0/0/0/')] But I don't know what to do with multiple chapters story (example: http://www.fanfiction.net/s/4952058/1/Moonlight_Spell). I tried looking at Ars Technica recipe, but have no idea what's going on. Can I request a recipe that works for multiple chapters story as well?

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Custom column read ?	pchrist7	Calibre	2	10-04-2010 03:52 AM
Archive for custom screensavers	sleeplessdave	Amazon Kindle	1	07-07-2010 01:33 PM
How to back up preferences and custom recipes?	greenapple	Calibre	3	03-29-2010 06:08 AM
Donations for Custom Recipes	ddavtian	Calibre	5	01-23-2010 05:54 PM
Help understanding custom recipes	andersent	Calibre	0	12-17-2009 03:37 PM

03-22-2009, 08:48 AM	#378
kiklop74 Guru Posts: 800 Karma: 194644 Join Date: Dec 2007 Location: Argentina Device: Kindle Voyage	You discovered a bug in calibre. For some reason calibre does not fetch the image inside the article. It is just being ignored. Please open bug report in calibre trac and attach your recipe to it so that Kovid can fix this.

03-23-2009, 02:59 PM	#382
AngeloT Junior Member Posts: 7 Karma: 10 Join Date: Mar 2009 Device: PRS 505	hello everybody... I have a problem with some german calibre recipes and the epub-output. The recipes for Spiegel Online and FAZ NET are not working correctly and I have no idea why... Spiegel Online gives only about eight pages with the overview on the articles and the FAZ NET-ebook leads to a freezing of my sony prs505... Does anybody have an idea why this happens or does anybody have the same problems? I stopped using the LRF-Output where those recipes worked well because of the bug that causes the reader to reset. The workaround described in the FAQ (download RSS-feed using calibre and transfering via the Sony software) is not acceptible because I loose the comfort of just upping some news to my reader in the sleepy morning... Perhaps somebody knows a solution for my problem? Thanks ! AngeloT

03-23-2009, 05:51 PM	#383
kovidgoyal creator of calibre Posts: 45,609 Karma: 28549044 Join Date: Oct 2006 Location: Mumbai, India Device: Various	Spiegel will be fixed in the next release. As for FAZ I dont see anything abviously wrong with the EPUB file, so bug SONY/Adobe to fix their software, the EPUB file certainly works correctly on the desktop

03-24-2009, 02:09 PM	#385
kovidgoyal creator of calibre Posts: 45,609 Karma: 28549044 Join Date: Oct 2006 Location: Mumbai, India Device: Various	http://calibre.kovidgoyal.net/user_m...les-to-recipes

03-25-2009, 12:48 PM	#389
kovidgoyal creator of calibre Posts: 45,609 Karma: 28549044 Join Date: Oct 2006 Location: Mumbai, India Device: Various	Thanks, updated.