![]() |
#1351 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
|
New recipe for digital spy UK:
|
![]() |
![]() |
#1352 |
Member
![]() Posts: 12
Karma: 42
Join Date: Jan 2010
Device: Kindle
|
keep_only_tags = [dict(attrs={'class':['print-title','print-subtitle','print-author','print-date-issue','print-content']})]
I put this in the recipe and it worked very nicely. However, the author and date are not coming through. Do I need to add something else? Denny |
![]() |
Advert | |
|
![]() |
#1353 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
|
OK try this one:
Code:
keep_only_tags = [dict(attrs={'class':['print-title','print-subtitle','print-author','author','print-date','print-date-issue','print-content']})] |
![]() |
![]() |
#1354 |
Member
![]() Posts: 12
Karma: 42
Join Date: Jan 2010
Device: Kindle
|
Brilliant. That worked. Thank you.
BTW, what's the best method to capture the cover image when the url changes each time. In this case the url includes the volume number, issue number, and the date. Denny |
![]() |
![]() |
#1355 | |
US Navy, Retired
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 9,897
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Kindle PaperWhite SE 11th Gen
|
Quote:
Code:
masthead_url = 'http://www.weeklystandard.com/sites/all/themes/weeklystandard/images/logo_red.png' |
|
![]() |
Advert | |
|
![]() |
#1356 |
Member
![]() Posts: 12
Karma: 42
Join Date: Jan 2010
Device: Kindle
|
I had included "print-logo" in the recipe that shows at the beginning of each article but that's a nice way to just include it at the beginning on the Kindle.
Thanks, Denny |
![]() |
![]() |
#1357 |
US Navy, Retired
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 9,897
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Kindle PaperWhite SE 11th Gen
|
When you zip it up to send to this forum include the icon in the zip. I've attached it for you.
|
![]() |
![]() |
#1358 |
Connoisseur
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 59
Karma: 4212
Join Date: Feb 2010
Device: Sony
|
Topeka Capital Journal recipe
Hello,
I am totally new to the ebook world and try to learn. I would like to have a recipe for the Topeka Capital Journal (http://cjonline.com/). I tried the "easy" way but all I can get is garbage. Thank you for any help you can provide! Gianfranco |
![]() |
![]() |
#1359 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
|
New recipe for Topeka Journal:
|
![]() |
![]() |
#1360 |
Member
![]() Posts: 12
Karma: 42
Join Date: Jan 2010
Device: Kindle
|
Walt,
1. why include the icon 2. I'm having trouble copying my recipe from calibre to Notepad. The indents change and the recipe won't work when it's copied back to calibre. Denny |
![]() |
![]() |
#1361 |
onlinenewsreader.net
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 327
Karma: 10143
Join Date: Dec 2009
Location: Phoenix, AZ & Victoria, BC
Device: Kindle 3, Kindle Fire, IPad3, iPhone4, Playbook, HTC Inspire
|
The Register (biting the hand that feeds IT)
Recipe for The Register -- a UK Information Technology news site.
Code:
#!/usr/bin/env python __license__ = 'GPL v3' __copyright__ = '2010, Nick Redding' ''' www.theregister.co.uk ''' import string, re from calibre import strftime from calibre.web.feeds.recipes import BasicNewsRecipe from calibre.ebooks.BeautifulSoup import BeautifulSoup from datetime import timedelta, datetime, date class TheRegister(BasicNewsRecipe): title = u'The Register' language = 'en_GB' __author__ = 'Nick Redding' oldest_article = 2 timefmt = '' # '[%b %d]' needs_subscription = False keep_only_tags = [dict(name='div', attrs={'id':'article'})] #remove_tags_before = [] remove_tags = [ {'id':['related-stories','ad-mpu1-spot'] }, {'class':['orig-url','article-nav','wptl btm','wptl top']} ] #remove_tags_after = [] no_stylesheets = True extra_css = ''' h2 {font-size: x-large; } h3 {font-size: large; font-weight: bold; } .byline {font-size: x-small; } .dateline {font-size: x-small; } ''' def get_browser(self): br = BasicNewsRecipe.get_browser() return br def get_masthead_url(self): masthead = 'http://www.theregister.co.uk/Design/graphics/std/logo_414_80.png' br = BasicNewsRecipe.get_browser() try: br.open(masthead) except: self.log("\nMasthead unavailable") masthead = None return masthead def preprocess_html(self,soup): # this removes the explicit url after links for span_tag in soup.findAll('span','URL'): span_tag.previous.replaceWith(re.sub("\ \($","",self.tag_to_string(span_tag.previous))) span_tag.next.next.replaceWith(re.sub("^\)","",self.tag_to_string(span_tag.next.next))) span_tag.extract() return soup def parse_index(self): def decode_date(datestr): udate = datestr.strip().lower().split() m = ['jan','feb','mar','apr','may','jun','jul','aug','sep','oct','nov','dec'].index(udate[1])+1 d = int(udate[0]) y = date.today().year return date(y,m,d) articles = {} key = None ans = [] def parse_index_page(page_name,page_title): def article_title(tag): atag = tag.find('a',href=True) return ''.join(atag.findAll(text=True, recursive=False)).strip() def article_date(tag): t = tag.find(True, {'class' : 'date'}) if t: return ''.join(t.findAll(text=True, recursive=False)).strip() return '' def article_summary(tag): t = tag.find(True, {'class' : 'standfirst'}) if t: return ''.join(t.findAll(text=True, recursive=False)).strip() return '' def article_url(tag): atag = tag.find('a',href=True) url = atag['href'] return url mainurl = 'http://www.theregister.co.uk' soup = self.index_to_soup(mainurl+page_name) # Find each instance of class="section-headline", class="story", class="story headline" for div in soup.findAll('div',attrs={'class':re.compile('^story-ref')}): # div contains all article data # check if article is too old datetag = div.find('span','date') if datetag: dateline_string = self.tag_to_string(datetag,False) a_date = decode_date(dateline_string) earliest_date = date.today() - timedelta(days=self.oldest_article) if a_date < earliest_date: self.log("Skipping article dated %s" % dateline_string) continue url = article_url(div) if 'http' in url: continue url = mainurl + url + 'print.html' self.log("URL %s" % url) title = article_title(div) self.log("Title %s" % title) pubdate = article_date(div) self.log("Date %s" % pubdate) description = article_summary(div) self.log("Description %s" % description) author = '' if not articles.has_key(page_title): articles[page_title] = [] articles[page_title].append(dict(title=title,url=url,date=pubdate,description=description,author=author,content='')) parse_index_page('','Front Page') ans.append('Front Page') parse_index_page('/hardware','Hardware') ans.append('Hardware') parse_index_page('/software','Software') ans.append('Software') parse_index_page('/music_media','Music & Media') ans.append('Music & Media') parse_index_page('/networks','Networks') ans.append('Networks') parse_index_page('/security','Security') ans.append('Security') parse_index_page('/public_sector','Public Sector') ans.append('Public Sector') parse_index_page('/business','Business') ans.append('Business') parse_index_page('/science','Science') ans.append('Science') parse_index_page('/odds','Odds & Sods') ans.append('Odds & Sods') ans = [(key, articles[key]) for key in ans if articles.has_key(key)] return ans |
![]() |
![]() |
#1362 |
Connoisseur
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 59
Karma: 4212
Join Date: Feb 2010
Device: Sony
|
Wow! Thanks!
|
![]() |
![]() |
#1363 | |
US Navy, Retired
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 9,897
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Kindle PaperWhite SE 11th Gen
|
Quote:
You can just paste the code in a post and wrap it in code tags (the #). |
|
![]() |
![]() |
#1364 |
Member
![]() Posts: 21
Karma: 10
Join Date: Jul 2008
Device: EZ Reader Pocket Pro
|
![]()
thanks for the recipe I was looking for one for this site, I tried to do it myself but I dont know nothing about programming... just 2 questions, how do I change the default image? and its there a way to show the pictures of the snips saved on read it later (retrieves only text) thank you.,
![]() |
![]() |
![]() |
#1365 |
Junior Member
![]() Posts: 3
Karma: 10
Join Date: Jan 2010
Device: none
|
Thanks for the tip & it works 70% of the time. Problem is with RSS feeds. Occasionally I want to use RSS feed from a Blog or a discussion board and my fetch may not repeat more than once. Instapaper solution on RSS feed will not work as I cannot ask Calibre to do a recessive get from Instapaper recipe.
|
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Custom column read ? | pchrist7 | Calibre | 2 | 10-04-2010 02:52 AM |
Archive for custom screensavers | sleeplessdave | Amazon Kindle | 1 | 07-07-2010 12:33 PM |
How to back up preferences and custom recipes? | greenapple | Calibre | 3 | 03-29-2010 05:08 AM |
Donations for Custom Recipes | ddavtian | Calibre | 5 | 01-23-2010 04:54 PM |
Help understanding custom recipes | andersent | Calibre | 0 | 12-17-2009 02:37 PM |