![]() |
#376 | |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
|
Quote:
You should use this to extract article from that site: Code:
keep_only_tags = [dict(name='div', attrs={'class':'article'})] remove_tags_after = dict(name='div',attrs={'class':'articletext'}) |
|
![]() |
![]() |
#377 |
Enthusiast
![]() Posts: 27
Karma: 10
Join Date: Mar 2009
Device: PRS-505
|
Thats more or less what I used, the Article can also be extraced fine, the problem is the picture within the article. Its a normal JPG picture, but still, it fails to be included. Tried Bookit to get the whole page but it also fails to include the articles picture.
e.g.: http://diepresse.com/home/panorama/r...ex.do?from=rss Picture of the pope in there, nevertheless, no picture included in the final ebook. Code:
remove_tags_before = dict(id='content') remove_tags_after = dict(id='content') |
![]() |
![]() |
#378 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
|
You discovered a bug in calibre. For some reason calibre does not fetch the image inside the article. It is just being ignored. Please open bug report in calibre trac and attach your recipe to it so that Kovid can fix this.
|
![]() |
![]() |
#379 | |
Member
![]() Posts: 18
Karma: 10
Join Date: Feb 2009
Device: none
|
Quote:
Typically background images are used as "fluff" on a page are should be assumed as irrelevant furniture and are therefore (rightly IMHO) ignored. If the image conveys meaning it should have used a normal <img... tag with an appropriate alt attribute for accessibility. IMHO it's the fault of diepresse not Calibre. Rufus. |
|
![]() |
![]() |
#380 |
Junior Member
![]() Posts: 1
Karma: 10
Join Date: Mar 2009
Device: none
|
Receipt for NZZ - Neue Zuericher Zeitun
Hi,
I just made my first receipt for www.nzz.ch It's doing all I want, but unfortunately it's so slow... ( 56min to produce a 0.3MB ebook ) I started with the BBC receipt to do this, but I don't see, why the NZZ version should be so slow. Here's the receipt: Code:
#!/usr/bin/env python ''' nzz.ch ''' from calibre.web.feeds.news import BasicNewsRecipe class NewNzz(BasicNewsRecipe): title = u'Neue Zuericher Zeitung' __author__ = 'NZZ' description = 'Neue Zuericher Zeitung' no_stylesheets = True language = _('German') keep_only_tags = [dict(name='div', attrs={'class':'article'})] remove_tags_before = dict(id='article') remove_tags_after = dict(id='article') remove_tags = [dict(attrs={'class':['more', 'nowrap', 'footer', 'teaser', 'articleTools', 'post-tools', 'side_tool', 'nextArticleLink clearfix']}), dict(id=['formSendArticle', 'footer', 'toolsRight', 'articleInline', 'navigation', 'archive', 'side_search', 'blog_sidebar', 'side_tool', 'side_index']), dict(name=['script', 'noscript', 'style'])] feeds = [ ('Top Themen', 'http://www.nzz.ch/nachrichten/startseite?rss=true'), ('International', 'http://www.nzz.ch/nachrichten/international?rss=true'), ('Schweiz', 'http://www.nzz.ch/nachrichten/schweiz?rss=true'), ('Wirtschaft', 'http://www.nzz.ch/nachrichten/wirtschaft/aktuell?rss=true'), ('Zuerich', 'http://www.nzz.ch/nachrichten/zuerich?rss=true'), ('Sport', 'http://www.nzz.ch/nachrichten/sport?rss=true'), ('Panorama', 'http://www.nzz.ch/nachrichten/panorama?rss=true'), ] def print_version(self, url): return url+'?printview=true' Best regards keckx |
![]() |
![]() |
#381 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
|
NZZ online server is quite slow. There is nothing you can do about that.
Just some notes about the recipe: This: Code:
remove_tags_before = dict(id='article') remove_tags_after = dict(id='article') Code:
keep_only_tags = [dict(name='div', attrs={'id':'article'})] |
![]() |
![]() |
#382 |
Junior Member
![]() Posts: 7
Karma: 10
Join Date: Mar 2009
Device: PRS 505
|
hello everybody...
I have a problem with some german calibre recipes and the epub-output. The recipes for Spiegel Online and FAZ NET are not working correctly and I have no idea why... Spiegel Online gives only about eight pages with the overview on the articles and the FAZ NET-ebook leads to a freezing of my sony prs505... ![]() Does anybody have an idea why this happens or does anybody have the same problems? I stopped using the LRF-Output where those recipes worked well because of the bug that causes the reader to reset. The workaround described in the FAQ (download RSS-feed using calibre and transfering via the Sony software) is not acceptible because I loose the comfort of just upping some news to my reader in the sleepy morning... Perhaps somebody knows a solution for my problem? Thanks ! AngeloT |
![]() |
![]() |
#383 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,378
Karma: 27230406
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Spiegel will be fixed in the next release. As for FAZ I dont see anything abviously wrong with the EPUB file, so bug SONY/Adobe to fix their software, the EPUB file certainly works correctly on the desktop
|
![]() |
![]() |
#384 | |
Junior Member
![]() Posts: 4
Karma: 10
Join Date: Nov 2007
Device: sony reader
|
I have been using Calibre for some time and made some of my own recipes. Over a long period I have been using Calibre 0.4.67. and those recipes were working on this Calibre version. I have tried few next Calibre versions but those custom recipes were not working on those trials.
Now I have 0.5.2. Calibre version installed and those recipes do not work either. The following Conversion Error shows up: Quote:
The code of each recipe is a little bit different and it generally presents itself in this fashion: Code:
from libprs500.ebooks.lrf.web.profiles import DefaultProfile import re class DiePresseWirtschaft(DefaultProfile): title = 'DiePresseWirtschaft' timefmt = ' [%d %b %Y]' summary_length = 1000 oldest_article = 1 max_articles_per_feed = 100 max_recursions = 2 html_description = True no_stylesheets = True def get_feeds(self): return [ ('Die Presse Wirtschaft', 'http://www.diepresse.com/rss/Wirtschaft') ] def print_version(self,url): return url.replace('index.do?from=rss', 'print.do') preprocess_regexps = [ (re.compile(r'<script>.*?</script>', re.IGNORECASE | re.DOTALL), lambda match : ''), (re.compile(r'<H4>.*?</H4>', re.IGNORECASE | re.DOTALL), lambda match : ''), ] |
|
![]() |
![]() |
#385 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,378
Karma: 27230406
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
|
![]() |
![]() |
#386 |
Junior Member
![]() Posts: 7
Karma: 10
Join Date: Sep 2008
Device: Sony PRS-505
|
![]()
Hi
I'm using the google reader recipe, but kovidgoyal (who is usually right!) thinks this only downloads starred messages. Is there a way to get it to load all unread messages, or do I have to manually create a recipe file for each of my 40 feeds? TIA Shaun |
![]() |
![]() |
#387 | |
Junior Member
![]() Posts: 7
Karma: 10
Join Date: Mar 2009
Device: PRS 505
|
Quote:
![]() I don't know what's wrong with the FAZ Epub - looks fine on the desktop but it has very large pages with small letters and is also very broad. I think this is the reason why it freezes my Sony reader, which is perhaps not able to format these pages right. |
|
![]() |
![]() |
#388 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
|
FAZ recipe was not cleaning all of the styles and that made content hard to read. Here is vastly updated recipe that produces correct epub.
@Kovid Please update this with your upcoming release. FAZ updated recipe: |
![]() |
![]() |
#389 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,378
Karma: 27230406
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Thanks, updated.
|
![]() |
![]() |
#390 |
Hyperreader
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 130
Karma: 28678
Join Date: Feb 2009
Device: Current: Boox Leaf2 (broken) Past: H2O, Kindle PW1, DXG;Pocketbook 360
|
FanFiction.net
I tried to make a recipe for FanFiction.net. Well, unless it is a multiple chapters story, it is easy enough.
Code:
class FanFiction(BasicNewsRecipe): title = u'FanFiction' oldest_article = 7 max_articles_per_feed = 10 use_embedded_content = False remove_javascript = True keep_only_tags = [dict(name='div', attrs={'id':'storytext'})] feeds = [(u'Just In', u'http://www.fanfiction.net/atom/j/0/0/0/')] ![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Custom column read ? | pchrist7 | Calibre | 2 | 10-04-2010 02:52 AM |
Archive for custom screensavers | sleeplessdave | Amazon Kindle | 1 | 07-07-2010 12:33 PM |
How to back up preferences and custom recipes? | greenapple | Calibre | 3 | 03-29-2010 05:08 AM |
Donations for Custom Recipes | ddavtian | Calibre | 5 | 01-23-2010 04:54 PM |
Help understanding custom recipes | andersent | Calibre | 0 | 12-17-2009 02:37 PM |