![]() |
#2446 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
That was one way to scrape the image from a page that has that image. It's more or less guaranteed to have the image you want. If you don't want to scrape it, what do you want to do? Do you just want to build the URL from the current date? Will the current date produce a valid URL in all cases?
When I want to do something like build the URL, I usually scrape the text for the year/month/day off of the pages I'm scraping to build the ebook. Do you already have the year/month/day text you need to construct the URL? |
![]() |
![]() |
#2447 |
Member
![]() Posts: 13
Karma: 34
Join Date: Jul 2010
Device: hanlin, astak the 2010 version plz.
|
I ended up borrowing the " def get_cover_url(self):" code from the new york times top stories basic recipe.
Code:
import time class AdvancedUserRecipe1281810521(BasicNewsRecipe): title = u'NY Daily News' __author__ = 'you' description = 'News from NY Daily News' language = 'en' publisher = 'NY Daily News' category = 'news, politics, sports, ny' oldest_article = 7 max_articles_per_feed = 100 no_stylesheets = True extra_css = '.art_header {text-align: left;}\n \ .byline {font-family: monospace; \ text-align: left; \ margin-top: 0px; \ margin-bottom: 0px;}\n \ .datestamp_update {font-size: small; \ margin-top: 0px; \ margin-bottom: 0px;}\n \ .art_img_lrg_txt {text-align: left; \ font-style: italic;}\n \ .art_img_lrg {text-align: center;}\n \ .art_img_lrg_credit {text-align: right; \ font-size: small; \ margin-top: 0px; \ margin-bottom: 0px;}\n \ .art_story {text-align: left;}\n \ ' def get_cover_url(self): cover = None st = time.localtime() year = str(st.tm_year) month = "%.2d" % st.tm_mon day = "%.2d" % st.tm_mday cover = 'http://assets.nydailynews.com/img/' + year + '/' + month +'/' + day +'/gal_frontpage_' + month + day +'.jpg' br = BasicNewsRecipe.get_browser() try: br.open(cover) except: self.log("\nCover unavailable") cover = None return cover encoding = 'utf-8' oldest_article = 7 max_articles_per_feed = 100 keep_only_tags = [ dict(name='div', attrs={'id':['art_story']}) ] remove_tags = [ dict(name='div', attrs={'class':['code_module']}) ] feeds = [(u'Top Stories', u'http://www.nydailynews.com/index_rss.xml'), (u'News', u'http://www.nydailynews.com/news/index_rss.xml'), (u'NY Crime', u'http://www.nydailynews.com/news/ny_crime/index_rss.xml'), (u'NY Local', u'http://www.nydailynews.com/ny_local/index_rss.xml'), (u'Politics', u'http://www.nydailynews.com/news/politics/index_rss.xml'), (u'Music', u'http://www.nydailynews.com/entertainment/music/index_rss.xml'), (u'Arts', u'http://www.nydailynews.com/entertainment/arts/index_rss.xml'), (u'Food and Dining', u'http://www.nydailynews.com/lifestyle/food/index_rss.xml'), (u'Lifestyle', u'http://www.nydailynews.com/lifestyle/index_rss.xml'), (u'Health/Well Being', u'http://www.nydailynews.com/lifestyle/health/index_rss.xml'), (u'Sports', u'http://www.nydailynews.com/sports/index_rss.xml'), ] |
![]() |
![]() |
#2448 | |
Member
![]() Posts: 17
Karma: 10
Join Date: Aug 2010
Device: Kindle DX
|
Quote:
Inside the first h1 tag there is: <a title="(text of different headline)" href="/">(text of headline I want)</a> Nothing inside the second h1 tag. This applies to any article in the online version of the St Louis Post-Dispatch. |
|
![]() |
![]() |
#2449 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
It looks to me like you've got it backwards. I think you want to keep the second tag, the one without the <a> tag. The second one is the title for your article. Try this: Code:
remove_tags= [dict(name='div', attrs={'id':'blox-header'})] |
|
![]() |
![]() |
#2450 | |
Member
![]() Posts: 17
Karma: 10
Join Date: Aug 2010
Device: Kindle DX
|
Quote:
<div class="grid_4" id="blox-logo"> I've tried: remove_tags= [dict(name='div', attrs={'class':'grid_4'})] and remove_tags= [dict(name='div', attrs={'id':'blox-logo'})] but neither worked. Any suggestions? |
|
![]() |
![]() |
#2451 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
|
|
![]() |
![]() |
#2452 | |
Junior Member
![]() Posts: 4
Karma: 10
Join Date: Aug 2010
Location: Colombia
Device: Sony PRS-300
|
Let's see if someone can help me. I made this recipe and I get as they want it. The only problem he has is that the title comes with the same font size for the article and I wish to come out bigger and bold. How could it? ...
Thanks for the help and here I leave the recipe: Quote:
|
|
![]() |
![]() |
#2453 |
Addict
![]() Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
|
I'm trying to learn how to make my own recipes. Trying to follow the tutorial but I'm a little lost. I downloaded a python editor and then entered the following code:
Code:
class AdvancedUserRecipe1282103072(BasicNewsRecipe): title = u'AJC' oldest_article = 1 max_articles_per_feed = 100 no_stylesheets = True feeds = [(u'Breaking News', u'http://www.ajc.com/genericList-rss.do?source=61499'), (u'News Q & A', u'http://www.ajc.com/genericList-rss.do?source=77197'), (u'Metro and Georgia', u'http://www.ajc.com/section-rss.do?source=news'), (u'Cobb County', u'http://www.ajc.com/section-rss.do?source=cobb'), (u'Opinion', u'http://www.ajc.com/section-rss.do?source=opinion')] I thought maybe adding : Code:
def get_article_url(self, article): url = article.get('guid', None) if 'podcasts' in url or 'surveys' in url: url = None return url http://www.ajc.com/news/atlanta/memo...rss_news_61499 I would assume I would want to use some form of a reg expression to trim everything after the ? and replace it with printArticle=y but i'm confused cause this is all new to me ![]() Code:
def print_version(self, url): return url.replace(url+'?printArticle=y') Any help would be appreciated...thank you so much.. |
![]() |
![]() |
#2454 |
Zealot
![]() ![]() Posts: 115
Karma: 150
Join Date: Jul 2008
Location: Netherlands Veenendaal
Device: Palm T5, Sony PRS-505, Nook Color
|
Hi All,
I'm hoping Kwetal is still following this thread since one of its recipes has gone haywire. It the nrcnext recipe and its failing with the following error: Spoiler:
First I thought it had todo with the fact that one of the rss feeds has changed but editing the recipe didn't help. Maybe a lot more has changed than only that but even debugging the recipe with -vv didn't show more info then the above. I'm using Calibre-0.714 Regards, Joop |
![]() |
![]() |
#2455 |
Junior Member
![]() Posts: 1
Karma: 10
Join Date: Aug 2010
Device: Kindle DX
|
Does anybody have a recipe for Tor.com?
http://www.tor.com/ |
![]() |
![]() |
#2456 |
Member
![]() Posts: 17
Karma: 10
Join Date: Aug 2010
Device: Kindle DX
|
I can paste my recipe but am unfamiliar with CODE and SPOILER tags. Can you explain?
Three remaining goals: 1. Headline outputs twice. Want to remove one. 2. Change masthead from Kindle generic. Used the following without success: def get_masthead_title(self) return 'mystring' 3. Add new page command before every h1. Tried this but got error message: h1 {page_break_before:always} |
![]() |
![]() |
#2457 |
Junior Member
![]() Posts: 1
Karma: 10
Join Date: Aug 2010
Device: none
|
Does anybody have a recipe for Pumbed (http://www.ncbi.nlm.nih.gov/pubmed) to be used in Calibre so that I can get the topics cleanly. I have created a RSS for lung cancer:
http://eutils.ncbi.nlm.nih.gov/entre...pUadKjxg6iRImT I would like to get the title, journal and authors in different lines in the "Section Menu". The abstract pages below has duplicated titles. Otherwise it is fine. Thanks in advance. SD Last edited by sde; 08-18-2010 at 01:29 PM. |
![]() |
![]() |
#2458 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
The CODE tag is the hash mark/pound symbol on the toolbar when you're replying. The SPOILER tag is the eye with an X in it on the same bar. Just paste your code, highlight it, then hit the code button, followed by the spoiler button. The code tag preserves essential formatting. The spoiler tag compresses it so others don't have to see it all, even if it's long.
Last edited by Starson17; 08-18-2010 at 02:57 PM. |
![]() |
![]() |
#2459 | ||||
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
Quote:
Quote:
Quote:
Here is code that I tested on a few of your links. It should work. Code:
def print_version(self, url): return url.partition('?')[0] +'?printArticle=y' Last edited by Starson17; 08-18-2010 at 03:01 PM. |
||||
![]() |
![]() |
#2460 | |
Addict
![]() Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
|
Quote:
![]() |
|
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Custom column read ? | pchrist7 | Calibre | 2 | 10-04-2010 02:52 AM |
Archive for custom screensavers | sleeplessdave | Amazon Kindle | 1 | 07-07-2010 12:33 PM |
How to back up preferences and custom recipes? | greenapple | Calibre | 3 | 03-29-2010 05:08 AM |
Donations for Custom Recipes | ddavtian | Calibre | 5 | 01-23-2010 04:54 PM |
Help understanding custom recipes | andersent | Calibre | 0 | 12-17-2009 02:37 PM |