|
|
#706 |
|
Comparer of the Ephemeris
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,496
Karma: 424697
Join Date: Mar 2009
Device: iPad
|
macsilber: It would be more helpful if you could post the recipe you're using.
dmendozadmd: a 'sticky' is a popular topic that stays in the upper list of topics in the forum, so they're easier to find. A recipe is a script that calibre uses to download the contents of a particular website, then format it for your eReader. G |
|
|
|
|
#707 | |
|
Junior Member
![]() Posts: 5
Karma: 10
Join Date: Aug 2009
Location: Philadelphia
Device: PRS-505
|
Quote:
And yes, I realize I'm just probably missing something obvious...
|
|
|
|
|
|
#708 |
|
Member
![]() Posts: 14
Karma: 10
Join Date: Aug 2009
Device: Kindle 2
|
In a custom recipe, how do I remove multiple div classes?
For example, from this source page (http://www.tnr.com/print/article/pol...ocking-roberts), I want to remove these div classes: print-logo, print-site_name, img-left, and print-source_url. Probably a simple syntax question, but I'm new to Python. I have tried... Code:
remove_tags = [dict(name='div', attrs={'class':'print-logo'})]
remove_tags = [dict(name='div', attrs={'class':'print-site_name'})]
remove_tags = [dict(name='div', attrs={'class':'img-left'})]
remove_tags = [dict(name='div', attrs={'class':'print-source_url'})]
This gives me a syntax error: Code:
remove_tags = [dict(name='div', attrs={'class':'print-logo', 'print-site_name', 'img-left', 'print-source_url'})]
Thanks |
|
|
|
|
#709 |
|
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,598
Karma: 28548962
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Code:
remove_tags = [dict(name='div', attrs={'class':['print-logo', 'print-site_name', ..]}]
|
|
|
|
|
#710 |
|
Member
![]() Posts: 14
Karma: 10
Join Date: Aug 2009
Device: Kindle 2
|
Thanks... I knew it must have been something simple like that.
Your snippet as written gave me a syntax error, but adding a ) as the second to last character fixed it. |
|
|
|
|
#711 |
|
Comparer of the Ephemeris
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,496
Karma: 424697
Join Date: Mar 2009
Device: iPad
|
gomes: Post your recipe. You will probably need to use remove_tags as cix3 has learned to get rid of the stuff you don't want.
Basically, this involves going to a sample page, examining the HTML source, isolating the stuff you don't want, then specifying a remove_tags directive as Kovid has described in his post above this one. If you post your recipe, folks here are better able to help you refine it. G |
|
|
|
|
#712 |
|
Member
![]() Posts: 14
Karma: 10
Join Date: Aug 2009
Device: Kindle 2
|
Custom recipe for The New Republic
Hello,
Here's my first stab at a recipe for The New Republic (www.tnr.com). It aggregates all articles and blogs, minus the images. Enjoy! Code:
class The_New_Republic(BasicNewsRecipe):
title = 'The New Republic'
__author__ = 'cix3'
description = 'Intelligent, stimulating and rigorous examination of American politics, foreign policy and culture'
timefmt = ' [%b %d, %Y]'
oldest_article = 7
max_articles_per_feed = 100
remove_tags = [dict(name='div', attrs={'class':['print-logo', 'print-site_name', 'img-left', 'print-source_url']}), dict(name='hr', attrs={'class':'print-hr'}), dict(name='img')]
feeds = [
('Politics', 'http://www.tnr.com/rss/articles/Politics'),
('Books and Arts', 'http://www.tnr.com/rss/articles/Books-and-Arts'),
('Economy', 'http://www.tnr.com/rss/articles/Economy'),
('Environment and Energy', 'http://www.tnr.com/rss/articles/Environment-%2526-Energy'),
('Health Care', 'http://www.tnr.com/rss/articles/Health-Care'),
('Urban Policy', 'http://www.tnr.com/rss/articles/Urban-Policy'),
('World', 'http://www.tnr.com/rss/articles/World'),
('Film', 'http://www.tnr.com/rss/articles/Film'),
('Books', 'http://www.tnr.com/rss/articles/books'),
('The Plank', 'http://www.tnr.com/rss/blogs/The-Plank'),
('The Treatment', 'http://www.tnr.com/rss/blogs/The-Treatment'),
('The Spine', 'http://www.tnr.com/rss/blogs/The-Spine'),
('The Stash', 'http://www.tnr.com/rss/blogs/The-Stash'),
('The Vine', 'http://www.tnr.com/rss/blogs/The-Vine'),
('The Avenue', 'http://www.tnr.com/rss/blogs/The-Avenue'),
('William Galston', 'http://www.tnr.com/rss/blogs/William-Galston'),
('Simon Johnson', 'http://www.tnr.com/rss/blogs/Simon-Johnson'),
('Ed Kilgore', 'http://www.tnr.com/rss/blogs/Ed-Kilgore'),
('Damon Linker', 'http://www.tnr.com/rss/blogs/Damon-Linker'),
('John McWhorter', 'http://www.tnr.com/rss/blogs/John-McWhorter')
]
def print_version(self, url):
return url.replace('http://www.tnr.com/', 'http://www.tnr.com/print/')
|
|
|
|
|
#713 |
|
Enthusiast
![]() Posts: 49
Karma: 10
Join Date: Aug 2009
Device: none
|
can anyone help me with recipe of business standard
if the url for the article is http://www.business-standard.com/ind...?autono=369650 then print url is http://www.business-standard.com/ind...ono=369650&tp= |
|
|
|
|
#714 |
|
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 274
Karma: 1029955
Join Date: Feb 2009
Device: Palm IIIx, EBM-911, REB-1100(dead), PRS-505
|
It seems that the most recent version of the /. recipe in Calibre may have caused an auto-ban to be triggered for my IP address.
I noticed the last time that it seemed to be downloading more of the site than before, i.e. I had the article + comments, and I think that the way the site is setup that it leads to recursively downloading most of the site unless strictly limited. I used to have that problem with sitescooper and plucker and have to be very careful about limiting how much of /. was spidered to create a document for offline reading. (This would be the version included with 0.6.11 .) Last edited by cutterjohn42; 09-10-2009 at 09:48 AM. |
|
|
|
|
#715 |
|
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,598
Karma: 28548962
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Open a ticket about it, I'll look at it when I have a spare moment.
|
|
|
|
|
#716 |
|
Member
![]() Posts: 14
Karma: 10
Join Date: Aug 2009
Device: Kindle 2
|
Any idea how I can transform an article URL like this (http://www.motherjones.com/politics/...-job-van-jones) into the print URL (http://www.motherjones.com/print/27151) that I want to use for my recipe?
I'm hoping there's an easy way to find corresponding print URLs (by that 5 digit number) for articles. Rather than removing all unwanted html from the actual article... Any ideas? Edit: I should also note that the original article page actually splits the article into multiple pages (which I would want to combine into one article for my recipe). The print version lists the entire article. Last edited by cix3; 09-10-2009 at 08:38 PM. Reason: Add text |
|
|
|
|
#717 |
|
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,598
Karma: 28548962
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Just fetch the HTML and parse it looking for the print link
|
|
|
|
|
#718 |
|
Member
![]() Posts: 14
Karma: 10
Join Date: Aug 2009
Device: Kindle 2
|
|
|
|
|
|
#719 |
|
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,598
Karma: 28548962
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Cant think of one off hand but basically, it's something like this
Code:
def get_article_url(self, article):
url = ...(from article as before)
soup = self.index_to_soup(url)
# do some processing on soup to find the full article link
a = soup.find(name='a', href=True, text=re.compile(r'Full\s*Article')
if a is not None:
return a['href']
return url
|
|
|
|
|
#720 | |
|
Member
![]() Posts: 14
Karma: 10
Join Date: Aug 2009
Device: Kindle 2
|
Quote:
Hmmm... that's beyond my level of expertise. I'm going to have to wait for someone else to recommend a pre-built recipe that I can copy from. Thanks! |
|
|
|
![]() |
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Custom column read ? | pchrist7 | Calibre | 2 | 10-04-2010 02:52 AM |
| Archive for custom screensavers | sleeplessdave | Amazon Kindle | 1 | 07-07-2010 12:33 PM |
| How to back up preferences and custom recipes? | greenapple | Calibre | 3 | 03-29-2010 05:08 AM |
| Donations for Custom Recipes | ddavtian | Calibre | 5 | 01-23-2010 04:54 PM |
| Help understanding custom recipes | andersent | Calibre | 0 | 12-17-2009 02:37 PM |