05-05-2010, 09:29 AM | #1891 |
Junior Member
Posts: 8
Karma: 10
Join Date: May 2010
Device: Bebook One (Hanlin v3)
|
Arbetaren (Swedish socialist newspaper, works great!)
Code:
class Arbetaren_SE(BasicNewsRecipe): title = u'Arbetaren' __author__ = 'Joakim Lindskog' description = 'Nyheter från Arbetaren' publisher = 'Arbetaren' category = 'news, politics, socialism, Sweden' oldest_article = 7 delay = 1 max_articles_per_feed = 100 no_stylesheets = True use_embedded_content = False encoding = 'utf-8' language = 'sv' conversion_options = { 'comment' : description , 'tags' : category , 'publisher' : publisher , 'language' : language } keep_only_tags = [dict(name='div', attrs={'id':'article'})] remove_tags_before = dict(name='div', attrs={'id':'article'}) remove_tags_after = dict(name='p',attrs={'id':'byline'}) remove_tags = [ dict(name=['object','link','base']), dict(name='p', attrs={'class':'print'}), dict(name='a', attrs={'class':'addthis_button_compact'}), dict(name='script') ] feeds = [(u'Nyheter', u'http://www.arbetaren.se/rss/arbetaren.rss?rev=123')] |
05-05-2010, 10:38 AM | #1892 | |
Guru
Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
|
Quote:
|
|
Advert | |
|
05-05-2010, 09:27 PM | #1893 | |
Connoisseur
Posts: 53
Karma: 496648
Join Date: May 2010
Device: Sony PRS-600
|
Quote:
WL Last edited by mobilewilier; 05-05-2010 at 10:57 PM. |
|
05-05-2010, 11:33 PM | #1894 |
Vox calibre
Posts: 412
Karma: 1175230
Join Date: Jan 2009
Device: Sony reader prs700, kobo
|
I had this request on facebook. if someone can do it cause I am a little busy rt now..
The East Bay Express. http://www.eastbayexpress.com/ebx/Home Thanks |
05-07-2010, 09:11 AM | #1895 |
Member
Posts: 14
Karma: 10
Join Date: Aug 2007
Location: Switzerland
Device: Kindle Voyage, Kobo
|
Problem with my recipe for "Kommersant" Russian daily
Hi, I am trying to make a simple recipe for the best russian language newspaper Kommersant.
Here is the recipe: Code:
from calibre.web.feeds.news import BasicNewsRecipe class AdvancedUserRecipe1272297716(BasicNewsRecipe): title = u'Kommersant' oldest_article = 7 max_articles_per_feed = 100 feeds = [(u'Kommersant', u'http://feeds.kommersant.ru/RSS_Export/RU/daily.xml')] def print_version(self,url): segments = url.split('=') article_id = segments[1] newurl = 'http://www.kommersant.ru/doc-rss.aspx?DocsID=' + article_id + '&print=true' return newurl what am i doing wrong ? Thanks! Code:
ERROR: Conversion Error: <b>Failed</b>: Fetch news from Kommersant Code:
Fetching http://www.kommersant.ru/doc-rss.aspx?DocsID=1365119 Downloaded article: Tencent разложила DST на активы // Mail.ru, "Вконтакте" и "Одноклассникам" прописали мультипликаторы from http://www.kommersant.ru Code:
lxml.etree.XMLSyntaxError: Failed to parse QName 'font-size:', line 33, column 3710 |
Advert | |
|
05-07-2010, 10:14 AM | #1896 |
Enthusiast
Posts: 43
Karma: 50
Join Date: May 2009
Device: Kindle3
|
When running a job to create a Kindle file from a recipe, I often look at the job details to see what progress is being made. Is there any way to save the column widths of the Job Details screen? Each time I go in, I need to expand the columns to see the detail I'm looking at. It would be nice to customize the column widths and have them stay fixed after that. (The total screen size of that panel as well)
|
05-07-2010, 12:03 PM | #1897 |
Member
Posts: 14
Karma: 10
Join Date: Aug 2007
Location: Switzerland
Device: Kindle Voyage, Kobo
|
Kommersant
OK, now it's generally working,
Code:
from calibre.web.feeds.news import BasicNewsRecipe class KommersantRecipe(BasicNewsRecipe): title = u'Kommersant' oldest_article = 7 max_articles_per_feed = 100 feeds = [(u'Kommersant', u'http://feeds.kommersant.ru/RSS_Export/RU/daily.xml')] def print_version(self,url): segments = url.split('=') article_id = segments[1] newurl = 'http://www.kommersant.ru/doc.aspx?DocsID=' + article_id + '&print=true' return newurl Thanks all! |
05-07-2010, 12:15 PM | #1898 |
creator of calibre
Posts: 43,842
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
@smargo: Use
conversion_options = {'linearize_tables':True} |
05-07-2010, 12:45 PM | #1899 |
Member
Posts: 14
Karma: 10
Join Date: Aug 2007
Location: Switzerland
Device: Kindle Voyage, Kobo
|
Kommersant
@kovidgoyal
Thanks! It workes! Some cosmetics remains to be done, but I am happy. |
05-07-2010, 12:53 PM | #1900 |
Junior Member
Posts: 2
Karma: 10
Join Date: Feb 2010
Device: kindle
|
Internation Herald Tribune - Euro edition
is there a way to get the recipe for the IHT - Euro edition?
thanks so much |
05-08-2010, 04:01 AM | #1901 |
Member
Posts: 15
Karma: 10
Join Date: Apr 2010
Device: Kindle 2 Global
|
When I send a book back to my Kindle, is there a way to keep the original file name, without embedding the author's name to a shortened version?
|
05-08-2010, 12:49 PM | #1902 |
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Discover Magazine recipe
Multipage implemented for multiple page articles,
New feeds, Miscellaneous advertising and junk removed. Code:
#!/usr/bin/env python __license__ = 'GPL v3' __copyright__ = '2008, Kovid Goyal kovid@kovidgoyal.net' __docformat__ = 'restructuredtext en' ''' discovermagazine.com ''' import re from calibre.web.feeds.news import BasicNewsRecipe from calibre.ebooks.BeautifulSoup import BeautifulSoup, Tag class DiscoverMagazine(BasicNewsRecipe): title = u'Discover Magazine' description = u'Science, Technology and the Future' __author__ = 'Starson17' language = 'en' oldest_article = 33 max_articles_per_feed = 20 no_stylesheets = True remove_javascript = True use_embedded_content = False encoding = 'utf-8' extra_css = '.headline {font-size: x-large;} \n .fact {padding-top: 10pt}' remove_tags = [ dict(name='div', attrs={'id':['searchModule', 'mainMenu', 'tool-box']}), dict(name='div', attrs={'id':['footer','teaser','already-subscriber','teaser-suite','related-articles']}), dict(name='div', attrs={'class':['column']}), dict(name='img', attrs={'src':'http://discovermagazine.com/onebyone.gif'})] remove_tags_after = [dict(name='div', attrs={'class':'listingBar'})] def append_page(self, soup, appendtag, position): pager = soup.find('span',attrs={'class':'next'}) if pager: nexturl = pager.a['href'] soup2 = self.index_to_soup(nexturl) texttag = soup2.find('div', attrs={'class':'articlebody'}) newpos = len(texttag.contents) self.append_page(soup2,texttag,newpos) texttag.extract() appendtag.insert(position,texttag) def preprocess_html(self, soup): mtag = '<meta http-equiv="Content-Language" content="en-US"/>\n<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>' soup.head.insert(0,mtag) self.append_page(soup, soup.body, 3) pager = soup.find('div',attrs={'class':'listingBar'}) if pager: pager.extract() return soup def postprocess_html(self, soup, first_fetch): for tag in soup.findAll(text=re.compile('^This article is a sample')): tag.parent.extract() for tag in soup.findAll(['table', 'tr', 'td']): tag.name = 'div' for tag in soup.findAll('div', attrs={'class':'discreet advert'}): tag.extract() for tag in soup.findAll('hr', attrs={'size':'1'}): tag.extract() for tag in soup.findAll('br'): tag.extract() return soup feeds = [ (u'Technology', u'http://discovermagazine.com/topics/technology/rss.xml'), (u'Health - Medicine', u'http://discovermagazine.com/topics/health-medicine/rss.xml'), (u'Mind Brain', u'http://discovermagazine.com/topics/mind-brain/rss.xml'), (u'Space', u'http://discovermagazine.com/topics/space/rss.xml'), (u'Human Origins', u'http://discovermagazine.com/topics/human-origins/rss.xml'), (u'Living World', u'http://discovermagazine.com/topics/living-world/rss.xml'), (u'Environment', u'http://discovermagazine.com/topics/environment/rss.xml'), (u'Physics & Math', u'http://discovermagazine.com/topics/physics-math/rss.xml'), (u"20 Things you didn't know about...", u'http://discovermagazine.com/columns/20-things-you-didnt-know/rss.xml'), (u'Fuzzy Math', u'http://discovermagazine.com/columns/fuzzy-math/rss.xml'), (u'The Brain', u'http://discovermagazine.com/columns/the-brain/rss.xml'), (u'What is This', u'http://discovermagazine.com/columns/what-is-this/rss.xml'), (u'Vital Signs', u'http://discovermagazine.com/columns/vital-signs/rss.xml'), (u'Think Tech', u'http://discovermagazine.com/columns/think-tech/rss.xml'), (u'Future Tech', u'http://discovermagazine.com/columns/future-tech/rss.xml'), (u'Discover Interview', u'http://discovermagazine.com/columns/discover-interview/rss.xml'), ] |
05-10-2010, 09:32 AM | #1903 |
Guru
Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
|
Russian news pack:
Kommersant Izvestia Ria Novosti Argumenti & fakti |
05-10-2010, 12:30 PM | #1904 |
Member
Posts: 14
Karma: 10
Join Date: Aug 2007
Location: Switzerland
Device: Kindle Voyage, Kobo
|
@kiklop74
Your Kommersant recipe is great, thanks for this Russian pack! Small wish: in the bottom of certain articles in Kommersant, there are links to the additional pages (of the same issue, they are not avaialble as links from rss). For example, on the page http://www.kommersant.ru/doc-rss.aspx?DocsID=1366511 there are links to page "2" - http://www.kommersant.ru/doc.aspx?DocsID=1366459 and page "3" - http://www.kommersant.ru/doc.aspx?DocsID=1366462. Is there any way to include these additional pages? |
05-10-2010, 12:52 PM | #1905 |
Guru
Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
|
There is always a way. But right now I have no spare time nor will I have it in the foreseeable future. You are on your own on this one.
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Custom column read ? | pchrist7 | Calibre | 2 | 10-04-2010 02:52 AM |
Archive for custom screensavers | sleeplessdave | Amazon Kindle | 1 | 07-07-2010 12:33 PM |
How to back up preferences and custom recipes? | greenapple | Calibre | 3 | 03-29-2010 05:08 AM |
Donations for Custom Recipes | ddavtian | Calibre | 5 | 01-23-2010 04:54 PM |
Help understanding custom recipes | andersent | Calibre | 0 | 12-17-2009 02:37 PM |