![]() |
#706 |
Comparer of the Ephemeris
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,496
Karma: 424697
Join Date: Mar 2009
Device: iPad
|
macsilber: It would be more helpful if you could post the recipe you're using.
dmendozadmd: a 'sticky' is a popular topic that stays in the upper list of topics in the forum, so they're easier to find. A recipe is a script that calibre uses to download the contents of a particular website, then format it for your eReader. G |
![]() |
![]() |
#707 | |
Junior Member
![]() Posts: 5
Karma: 10
Join Date: Aug 2009
Location: Philadelphia
Device: PRS-505
|
Quote:
And yes, I realize I'm just probably missing something obvious... ![]() |
|
![]() |
![]() |
#708 |
Member
![]() Posts: 14
Karma: 10
Join Date: Aug 2009
Device: Kindle 2
|
In a custom recipe, how do I remove multiple div classes?
For example, from this source page (http://www.tnr.com/print/article/pol...ocking-roberts), I want to remove these div classes: print-logo, print-site_name, img-left, and print-source_url. Probably a simple syntax question, but I'm new to Python. I have tried... Code:
remove_tags = [dict(name='div', attrs={'class':'print-logo'})] remove_tags = [dict(name='div', attrs={'class':'print-site_name'})] remove_tags = [dict(name='div', attrs={'class':'img-left'})] remove_tags = [dict(name='div', attrs={'class':'print-source_url'})] This gives me a syntax error: Code:
remove_tags = [dict(name='div', attrs={'class':'print-logo', 'print-site_name', 'img-left', 'print-source_url'})] Thanks |
![]() |
![]() |
#709 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,598
Karma: 28548962
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Code:
remove_tags = [dict(name='div', attrs={'class':['print-logo', 'print-site_name', ..]}] |
![]() |
![]() |
#710 |
Member
![]() Posts: 14
Karma: 10
Join Date: Aug 2009
Device: Kindle 2
|
Thanks... I knew it must have been something simple like that.
Your snippet as written gave me a syntax error, but adding a ) as the second to last character fixed it. |
![]() |
![]() |
#711 |
Comparer of the Ephemeris
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,496
Karma: 424697
Join Date: Mar 2009
Device: iPad
|
gomes: Post your recipe. You will probably need to use remove_tags as cix3 has learned to get rid of the stuff you don't want.
Basically, this involves going to a sample page, examining the HTML source, isolating the stuff you don't want, then specifying a remove_tags directive as Kovid has described in his post above this one. If you post your recipe, folks here are better able to help you refine it. G |
![]() |
![]() |
#712 |
Member
![]() Posts: 14
Karma: 10
Join Date: Aug 2009
Device: Kindle 2
|
Custom recipe for The New Republic
Hello,
Here's my first stab at a recipe for The New Republic (www.tnr.com). It aggregates all articles and blogs, minus the images. Enjoy! Code:
class The_New_Republic(BasicNewsRecipe): title = 'The New Republic' __author__ = 'cix3' description = 'Intelligent, stimulating and rigorous examination of American politics, foreign policy and culture' timefmt = ' [%b %d, %Y]' oldest_article = 7 max_articles_per_feed = 100 remove_tags = [dict(name='div', attrs={'class':['print-logo', 'print-site_name', 'img-left', 'print-source_url']}), dict(name='hr', attrs={'class':'print-hr'}), dict(name='img')] feeds = [ ('Politics', 'http://www.tnr.com/rss/articles/Politics'), ('Books and Arts', 'http://www.tnr.com/rss/articles/Books-and-Arts'), ('Economy', 'http://www.tnr.com/rss/articles/Economy'), ('Environment and Energy', 'http://www.tnr.com/rss/articles/Environment-%2526-Energy'), ('Health Care', 'http://www.tnr.com/rss/articles/Health-Care'), ('Urban Policy', 'http://www.tnr.com/rss/articles/Urban-Policy'), ('World', 'http://www.tnr.com/rss/articles/World'), ('Film', 'http://www.tnr.com/rss/articles/Film'), ('Books', 'http://www.tnr.com/rss/articles/books'), ('The Plank', 'http://www.tnr.com/rss/blogs/The-Plank'), ('The Treatment', 'http://www.tnr.com/rss/blogs/The-Treatment'), ('The Spine', 'http://www.tnr.com/rss/blogs/The-Spine'), ('The Stash', 'http://www.tnr.com/rss/blogs/The-Stash'), ('The Vine', 'http://www.tnr.com/rss/blogs/The-Vine'), ('The Avenue', 'http://www.tnr.com/rss/blogs/The-Avenue'), ('William Galston', 'http://www.tnr.com/rss/blogs/William-Galston'), ('Simon Johnson', 'http://www.tnr.com/rss/blogs/Simon-Johnson'), ('Ed Kilgore', 'http://www.tnr.com/rss/blogs/Ed-Kilgore'), ('Damon Linker', 'http://www.tnr.com/rss/blogs/Damon-Linker'), ('John McWhorter', 'http://www.tnr.com/rss/blogs/John-McWhorter') ] def print_version(self, url): return url.replace('http://www.tnr.com/', 'http://www.tnr.com/print/') |
![]() |
![]() |
#713 |
Enthusiast
![]() Posts: 49
Karma: 10
Join Date: Aug 2009
Device: none
|
can anyone help me with recipe of business standard
if the url for the article is http://www.business-standard.com/ind...?autono=369650 then print url is http://www.business-standard.com/ind...ono=369650&tp= |
![]() |
![]() |
#714 |
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 274
Karma: 1029955
Join Date: Feb 2009
Device: Palm IIIx, EBM-911, REB-1100(dead), PRS-505
|
It seems that the most recent version of the /. recipe in Calibre may have caused an auto-ban to be triggered for my IP address.
I noticed the last time that it seemed to be downloading more of the site than before, i.e. I had the article + comments, and I think that the way the site is setup that it leads to recursively downloading most of the site unless strictly limited. I used to have that problem with sitescooper and plucker and have to be very careful about limiting how much of /. was spidered to create a document for offline reading. (This would be the version included with 0.6.11 .) Last edited by cutterjohn42; 09-10-2009 at 09:48 AM. |
![]() |
![]() |
#715 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,598
Karma: 28548962
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Open a ticket about it, I'll look at it when I have a spare moment.
|
![]() |
![]() |
#716 |
Member
![]() Posts: 14
Karma: 10
Join Date: Aug 2009
Device: Kindle 2
|
Any idea how I can transform an article URL like this (http://www.motherjones.com/politics/...-job-van-jones) into the print URL (http://www.motherjones.com/print/27151) that I want to use for my recipe?
I'm hoping there's an easy way to find corresponding print URLs (by that 5 digit number) for articles. Rather than removing all unwanted html from the actual article... Any ideas? Edit: I should also note that the original article page actually splits the article into multiple pages (which I would want to combine into one article for my recipe). The print version lists the entire article. Last edited by cix3; 09-10-2009 at 08:38 PM. Reason: Add text |
![]() |
![]() |
#717 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,598
Karma: 28548962
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Just fetch the HTML and parse it looking for the print link
|
![]() |
![]() |
#718 |
Member
![]() Posts: 14
Karma: 10
Join Date: Aug 2009
Device: Kindle 2
|
|
![]() |
![]() |
#719 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,598
Karma: 28548962
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Cant think of one off hand but basically, it's something like this
Code:
def get_article_url(self, article): url = ...(from article as before) soup = self.index_to_soup(url) # do some processing on soup to find the full article link a = soup.find(name='a', href=True, text=re.compile(r'Full\s*Article') if a is not None: return a['href'] return url |
![]() |
![]() |
#720 | |
Member
![]() Posts: 14
Karma: 10
Join Date: Aug 2009
Device: Kindle 2
|
Quote:
Hmmm... that's beyond my level of expertise. I'm going to have to wait for someone else to recommend a pre-built recipe that I can copy from. Thanks! |
|
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Custom column read ? | pchrist7 | Calibre | 2 | 10-04-2010 02:52 AM |
Archive for custom screensavers | sleeplessdave | Amazon Kindle | 1 | 07-07-2010 12:33 PM |
How to back up preferences and custom recipes? | greenapple | Calibre | 3 | 03-29-2010 05:08 AM |
Donations for Custom Recipes | ddavtian | Calibre | 5 | 01-23-2010 04:54 PM |
Help understanding custom recipes | andersent | Calibre | 0 | 12-17-2009 02:37 PM |