![]() |
#2431 | |
Enthusiast
![]() Posts: 49
Karma: 10
Join Date: Aug 2010
Device: Nokia N800, EeePC 4G Surf
|
Quote:
Looks like I'm going to have to find a way to upgrade Calibre in any case. When I try to add the recipe, I get a "You must not use 8-bit bytestrings..." error, which, from what I can tell, was a bug that's been fixed with later versions of Calibre. Again, thanks in any case. R. == |
|
![]() |
![]() |
#2432 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
|
Until you do that try this modified version
|
![]() |
![]() |
#2433 |
Enthusiast
![]() Posts: 49
Karma: 10
Join Date: Aug 2010
Device: Nokia N800, EeePC 4G Surf
|
|
![]() |
![]() |
#2434 |
Junior Member
![]() Posts: 4
Karma: 10
Join Date: Aug 2010
Location: Colombia
Device: Sony PRS-300
|
Please someone could help me with the recipe from "El Espectador" (http:www.elespectador.com), a newspaper of Colombia that I could create for my Sony Reader prs-300.
THANK YOU !!! |
![]() |
![]() |
#2435 |
Member
![]() Posts: 17
Karma: 10
Join Date: Aug 2010
Device: Kindle DX
|
Recipe Help
Trying to make recipe for local newspaper. Want only three items: headline, byline, and story. I can include the story by using keep_only_tags command with div "blox-story-text."
The tag for the headline is found in <h1> of div id="blox-story." The tag for the byline is found in <p class="byline"> of div id="blox-story." Including the entire "blox-story" produces a lot of unwanted material. How can I target just the headline and byline? |
![]() |
![]() |
#2436 |
Junior Member
![]() Posts: 5
Karma: 10
Join Date: Aug 2010
Device: nook
|
Can anyone make a recipe for texasmonthly.com? It's kind of convoluted...
It might be subscriber only so I'll post some examples. RSS feed: http://feeds.feedburner.com/texasmonthlycurrent example article url: http://feedproxy.google.com/~r/texas...Q/webextra.php The article URL is always in the form of "http://feedproxy.google.com/~r/texasmonthlycurrent/~3/" + randomkey + "/" + pagename article url directs to: http://www.texasmonthly.com/2010-08-...as+Monthly%29# Everything after the ? can be ignored, so it's in the form of: "http://www.texasmonthly.com/" + currentyear + "-" + currentmonth + "-01" + "/" + pagename example print view url: http://www.texasmonthly.com/cms/prin...sue=2010-08-01 That's always in the form of "http://www.texasmonthly.com/cms/printthis.php?file=" + pagename + "&issue=" + currentyear + "-" + currentmonth + "-01" The names and number of pages may vary so it has to be pulled from the RSS feed in realtime. I assume printthis.php can pull anything from the past, but we're only worried about current articles so we can use the current year and month. Everything publishes on the first, os the day is always 01. The logic seems simple enough, I just don't know enough about the coding to implement it. Last edited by jordanmills; 08-13-2010 at 06:31 PM. |
![]() |
![]() |
#2437 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
Code:
dict(name='h1'), dict(name='p', attrs={'class':'byline'}) Last edited by Starson17; 08-13-2010 at 08:05 PM. |
|
![]() |
![]() |
#2438 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
1) treat it as an obfuscated link to the print version - read here 2) treat it as a normal feed, and keep_only or delete whatever you want to keep/delete. I'd go for #2 Edit: BTW, I did look at your links pretty closely. I even wrote a quick recipe. Using your RSS link as a normal feed works fine, despite the random number, but the print link(name of article) is being obfuscated with a redirect. If you really want to use the print link, you'll need to code through the obfuscation with #1 to get the article name. Then you'd need to code up or scrape the year/month to build the link you want to the print version. However, I seldom use print links anyway. It's just as easy, and often more fun, to use keep_only and remove tags to keep/remove what you want from the non-print page, which is why I suggested #2. Last edited by Starson17; 08-14-2010 at 01:06 PM. |
|
![]() |
![]() |
#2439 |
Addict
![]() Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
|
Anyone mind making a recipe for http://boortz.com/nealz_nuze/index.html please? The kindle version outright sucks. I subscribed to it and for whatever reason it wraps a lot of the "emails he gets" inside a table and the table will not pan even though it says move the controller around and it will
![]() |
![]() |
![]() |
#2440 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
New Recipe: GoComics
New Recipe: GoComics
200+ comics (defaults to 7 days for 25 comics - 20 general, 5 editorial). Size is adjustable. A companion to the comics.com recipe. Spoiler:
|
![]() |
![]() |
#2441 |
Member
![]() Posts: 17
Karma: 10
Join Date: Aug 2010
Device: Kindle DX
|
[QUOTE=Starson17]You haven't given enough info, but you can try adding this to the keep_only_tags:
Code:
dict(name='h1'), dict(name='p', attrs={'class':'byline'}) Thanks. This works except that h1 displays twice since it appears twice in the html. Anyway to limit output to one h1? |
![]() |
![]() |
#2442 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
Is there a class or id label inside the two h1 tags that differs between them? Or, you could just give me a link to an article and I'll check it out. Alternatively, there are more powerful/complicated ways to keep only the first h1 tag. Last edited by Starson17; 08-14-2010 at 01:03 PM. |
|
![]() |
![]() |
#2443 |
Member
![]() Posts: 13
Karma: 34
Join Date: Jul 2010
Device: hanlin, astak the 2010 version plz.
|
making a custom recipe for the NY Daily News, basic recipe, but first time using python and I need some help formatting.
here is waht I have now. ------------------------------------------------------------------- Code:
class AdvancedUserRecipe1281804307(BasicNewsRecipe): title = u'NY Daily News' __author__ = 'you' description = 'News from NY Daily News' language = 'en' publisher = 'NY Daily News' category = 'news, politics, sports, ny' oldest_article = 7 max_articles_per_feed = 100 no_stylesheets = True cover_url = encoding = 'utf-8' oldest_article = 7 max_articles_per_feed = 100 no_stylesheets = True keep_only_tags = [ dict(name='div', attrs={'id':['art_story']}) ] remove_tags = [ dict(name='div', attrs={'class':['code_module']}) ] feeds = [(u'Top Stories', u'http://www.nydailynews.com/index_rss.xml'), (u'News', u'http://www.nydailynews.com/news/index_rss.xml'), (u'NY Crime', u'http://www.nydailynews.com/news/ny_crime/index_rss.xml'), (u'NY Local', u'http://www.nydailynews.com/ny_local/index_rss.xml'), (u'Politics', u'http://www.nydailynews.com/news/politics/index_rss.xml'), (u'Music', u'http://www.nydailynews.com/entertainment/music/index_rss.xml'), (u'Arts', u'http://www.nydailynews.com/entertainment/arts/index_rss.xml'), (u'Food and Dining', u'http://www.nydailynews.com/lifestyle/food/index_rss.xml'), (u'Lifestyle', u'http://www.nydailynews.com/lifestyle/index_rss.xml'), (u'Health/Well Being', u'http://www.nydailynews.com/lifestyle/health/index_rss.xml'), (u'Sports', u'http://www.nydailynews.com/sports/index_rss.xml'), ] ------------------------------------------- as you can see, "cover_url" is blank, i'm not sure how to format the variables because the url for it will change depending on the date, and it's my first time using python. here is the basic format for the ny daily news cover page. http://assets.nydailynews.com/img/20...tpage_0814.jpg can somebody show me an example template on how to do this? thanks. i've another question, in the feeds section, what's that "u" for that just in front of the title and url? i.e., (u'NY Crime', u'http://www.nydailynews.com/news/ny_crime/index_rss.xml'), btw, i couldn't find the custom recipe for the ny daily news here in this forum. here are all teh feeds from the ny daily news http://www.nydailynews.com/services/...ols/index.html Last edited by soothsayer; 08-14-2010 at 02:35 PM. Reason: fixing indentation |
![]() |
![]() |
#2444 | ||
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
When pasting a recipe here, you should use the CODE tags (hash mark) or you lose all indents. Indents are critical for Python code.
Quote:
Code:
def get_cover_url(self): cover_url = None soup = self.index_to_soup(self.index) cover_item = soup.find('span', attrs={'class':'cover'}) if cover_item: cover_url = cover_item.img['src'] return cover_url Quote:
|
||
![]() |
![]() |
#2445 |
Member
![]() Posts: 13
Karma: 34
Join Date: Jul 2010
Device: hanlin, astak the 2010 version plz.
|
Here is one method: ("index" is just the URL to a page that has the cover in an img tag in a span tag of class "cover" where the src of the img tag is the URL to the cover)
Code:
def get_cover_url(self): cover_url = None soup = self.index_to_soup(self.index) cover_item = soup.find('span', attrs={'class':'cover'}) if cover_item: cover_url = cover_item.img['src'] return cover_url an example cover page url is the following: http://assets.nydailynews.com/img/20...tpage_0814.jpg it contains the year "2010", month "08", and day "14". the image url was obtained from the daily news cover page archive herehttp://www.nydailynews.com/news/galleries/august_2010_daily_news_front_pages/august_2010_daily_news_front_pages.html Last edited by soothsayer; 08-14-2010 at 03:26 PM. |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Custom column read ? | pchrist7 | Calibre | 2 | 10-04-2010 02:52 AM |
Archive for custom screensavers | sleeplessdave | Amazon Kindle | 1 | 07-07-2010 12:33 PM |
How to back up preferences and custom recipes? | greenapple | Calibre | 3 | 03-29-2010 05:08 AM |
Donations for Custom Recipes | ddavtian | Calibre | 5 | 01-23-2010 04:54 PM |
Help understanding custom recipes | andersent | Calibre | 0 | 12-17-2009 02:37 PM |