Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Closed Thread
 
Thread Tools Search this Thread
Old 09-07-2009, 12:39 PM   #706
GRiker
Comparer of the Ephemeris
GRiker ought to be getting tired of karma fortunes by now.GRiker ought to be getting tired of karma fortunes by now.GRiker ought to be getting tired of karma fortunes by now.GRiker ought to be getting tired of karma fortunes by now.GRiker ought to be getting tired of karma fortunes by now.GRiker ought to be getting tired of karma fortunes by now.GRiker ought to be getting tired of karma fortunes by now.GRiker ought to be getting tired of karma fortunes by now.GRiker ought to be getting tired of karma fortunes by now.GRiker ought to be getting tired of karma fortunes by now.GRiker ought to be getting tired of karma fortunes by now.
 
Posts: 1,496
Karma: 424697
Join Date: Mar 2009
Device: iPad
macsilber: It would be more helpful if you could post the recipe you're using.

dmendozadmd: a 'sticky' is a popular topic that stays in the upper list of topics in the forum, so they're easier to find. A recipe is a script that calibre uses to download the contents of a particular website, then format it for your eReader.

G
GRiker is offline  
Old 09-07-2009, 02:50 PM   #707
Gomes
Junior Member
Gomes began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Aug 2009
Location: Philadelphia
Device: PRS-505
Quote:
Originally Posted by GRiker View Post
Gomes,
There are RSS feeds in each section of philly.com. Follow the directions to create a custom feed, then ask for assistance if you get stuck. It's actually pretty simple.

G
I've been trying to get a clean copy for a couple of weeks with no success. Essentially, I am unable to get the print version of the stories. I've tried to go through the directions cited above, but that doesn't seem to help...What I end up with is the article with all the various menus, pictures, and comments, which makes it difficult to read at best, and takes forever for calibre to fetch and convert. Can anyone help?

And yes, I realize I'm just probably missing something obvious...
Gomes is offline  
Advert
Old 09-07-2009, 03:23 PM   #708
cix3
Member
cix3 began at the beginning.
 
Posts: 14
Karma: 10
Join Date: Aug 2009
Device: Kindle 2
In a custom recipe, how do I remove multiple div classes?

For example, from this source page (http://www.tnr.com/print/article/pol...ocking-roberts), I want to remove these div classes: print-logo, print-site_name, img-left, and print-source_url.

Probably a simple syntax question, but I'm new to Python. I have tried...

Code:
    remove_tags = [dict(name='div', attrs={'class':'print-logo'})]
    remove_tags = [dict(name='div', attrs={'class':'print-site_name'})]
    remove_tags = [dict(name='div', attrs={'class':'img-left'})]
    remove_tags = [dict(name='div', attrs={'class':'print-source_url'})]
... which only removes the last div class listed (in this case, print-source_url).

This gives me a syntax error:
Code:
    remove_tags = [dict(name='div', attrs={'class':'print-logo', 'print-site_name', 'img-left', 'print-source_url'})]
What is the correct syntax?

Thanks
cix3 is offline  
Old 09-07-2009, 03:27 PM   #709
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,377
Karma: 27230406
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Code:
remove_tags = [dict(name='div', attrs={'class':['print-logo', 'print-site_name', ..]}]
kovidgoyal is offline  
Old 09-07-2009, 03:38 PM   #710
cix3
Member
cix3 began at the beginning.
 
Posts: 14
Karma: 10
Join Date: Aug 2009
Device: Kindle 2
Thanks... I knew it must have been something simple like that.

Your snippet as written gave me a syntax error, but adding a ) as the second to last character fixed it.
cix3 is offline  
Advert
Old 09-07-2009, 05:12 PM   #711
GRiker
Comparer of the Ephemeris
GRiker ought to be getting tired of karma fortunes by now.GRiker ought to be getting tired of karma fortunes by now.GRiker ought to be getting tired of karma fortunes by now.GRiker ought to be getting tired of karma fortunes by now.GRiker ought to be getting tired of karma fortunes by now.GRiker ought to be getting tired of karma fortunes by now.GRiker ought to be getting tired of karma fortunes by now.GRiker ought to be getting tired of karma fortunes by now.GRiker ought to be getting tired of karma fortunes by now.GRiker ought to be getting tired of karma fortunes by now.GRiker ought to be getting tired of karma fortunes by now.
 
Posts: 1,496
Karma: 424697
Join Date: Mar 2009
Device: iPad
gomes: Post your recipe. You will probably need to use remove_tags as cix3 has learned to get rid of the stuff you don't want.

Basically, this involves going to a sample page, examining the HTML source, isolating the stuff you don't want, then specifying a remove_tags directive as Kovid has described in his post above this one.

If you post your recipe, folks here are better able to help you refine it.

G
GRiker is offline  
Old 09-07-2009, 05:18 PM   #712
cix3
Member
cix3 began at the beginning.
 
Posts: 14
Karma: 10
Join Date: Aug 2009
Device: Kindle 2
Custom recipe for The New Republic

Hello,

Here's my first stab at a recipe for The New Republic (www.tnr.com). It aggregates all articles and blogs, minus the images. Enjoy!

Code:
class The_New_Republic(BasicNewsRecipe):
    title = 'The New Republic'
    __author__ = 'cix3'
    description = 'Intelligent, stimulating and rigorous examination of American politics, foreign policy and culture'
    timefmt = ' [%b %d, %Y]'

    oldest_article = 7
    max_articles_per_feed = 100

    remove_tags = [dict(name='div', attrs={'class':['print-logo', 'print-site_name', 'img-left', 'print-source_url']}), dict(name='hr', attrs={'class':'print-hr'}), dict(name='img')]

    feeds = [
        ('Politics', 'http://www.tnr.com/rss/articles/Politics'),
        ('Books and Arts', 'http://www.tnr.com/rss/articles/Books-and-Arts'),
        ('Economy', 'http://www.tnr.com/rss/articles/Economy'),
        ('Environment and Energy', 'http://www.tnr.com/rss/articles/Environment-%2526-Energy'),
        ('Health Care', 'http://www.tnr.com/rss/articles/Health-Care'),
        ('Urban Policy', 'http://www.tnr.com/rss/articles/Urban-Policy'),
        ('World', 'http://www.tnr.com/rss/articles/World'),
        ('Film', 'http://www.tnr.com/rss/articles/Film'),
        ('Books', 'http://www.tnr.com/rss/articles/books'),
        ('The Plank', 'http://www.tnr.com/rss/blogs/The-Plank'),
        ('The Treatment', 'http://www.tnr.com/rss/blogs/The-Treatment'),
        ('The Spine', 'http://www.tnr.com/rss/blogs/The-Spine'),
        ('The Stash', 'http://www.tnr.com/rss/blogs/The-Stash'),
        ('The Vine', 'http://www.tnr.com/rss/blogs/The-Vine'),
        ('The Avenue', 'http://www.tnr.com/rss/blogs/The-Avenue'),
        ('William Galston', 'http://www.tnr.com/rss/blogs/William-Galston'),
        ('Simon Johnson', 'http://www.tnr.com/rss/blogs/Simon-Johnson'),
        ('Ed Kilgore', 'http://www.tnr.com/rss/blogs/Ed-Kilgore'),
        ('Damon Linker', 'http://www.tnr.com/rss/blogs/Damon-Linker'),
        ('John McWhorter', 'http://www.tnr.com/rss/blogs/John-McWhorter')
            ]

    def print_version(self, url):
        return url.replace('http://www.tnr.com/', 'http://www.tnr.com/print/')
cix3 is offline  
Old 09-09-2009, 09:20 PM   #713
bhandarisaurabh
Enthusiast
bhandarisaurabh began at the beginning.
 
Posts: 49
Karma: 10
Join Date: Aug 2009
Device: none
can anyone help me with recipe of business standard
if the url for the article is
http://www.business-standard.com/ind...?autono=369650
then print url is
http://www.business-standard.com/ind...ono=369650&tp=
bhandarisaurabh is offline  
Old 09-10-2009, 09:38 AM   #714
cutterjohn42
Addict
cutterjohn42 ought to be getting tired of karma fortunes by now.cutterjohn42 ought to be getting tired of karma fortunes by now.cutterjohn42 ought to be getting tired of karma fortunes by now.cutterjohn42 ought to be getting tired of karma fortunes by now.cutterjohn42 ought to be getting tired of karma fortunes by now.cutterjohn42 ought to be getting tired of karma fortunes by now.cutterjohn42 ought to be getting tired of karma fortunes by now.cutterjohn42 ought to be getting tired of karma fortunes by now.cutterjohn42 ought to be getting tired of karma fortunes by now.cutterjohn42 ought to be getting tired of karma fortunes by now.cutterjohn42 ought to be getting tired of karma fortunes by now.
 
Posts: 274
Karma: 1029955
Join Date: Feb 2009
Device: Palm IIIx, EBM-911, REB-1100(dead), PRS-505
It seems that the most recent version of the /. recipe in Calibre may have caused an auto-ban to be triggered for my IP address.

I noticed the last time that it seemed to be downloading more of the site than before, i.e. I had the article + comments, and I think that the way the site is setup that it leads to recursively downloading most of the site unless strictly limited. I used to have that problem with sitescooper and plucker and have to be very careful about limiting how much of /. was spidered to create a document for offline reading.

(This would be the version included with 0.6.11 .)

Last edited by cutterjohn42; 09-10-2009 at 09:48 AM.
cutterjohn42 is offline  
Old 09-10-2009, 10:44 AM   #715
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,377
Karma: 27230406
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Open a ticket about it, I'll look at it when I have a spare moment.
kovidgoyal is offline  
Old 09-10-2009, 08:32 PM   #716
cix3
Member
cix3 began at the beginning.
 
Posts: 14
Karma: 10
Join Date: Aug 2009
Device: Kindle 2
Any idea how I can transform an article URL like this (http://www.motherjones.com/politics/...-job-van-jones) into the print URL (http://www.motherjones.com/print/27151) that I want to use for my recipe?

I'm hoping there's an easy way to find corresponding print URLs (by that 5 digit number) for articles. Rather than removing all unwanted html from the actual article...

Any ideas?

Edit: I should also note that the original article page actually splits the article into multiple pages (which I would want to combine into one article for my recipe). The print version lists the entire article.

Last edited by cix3; 09-10-2009 at 08:38 PM. Reason: Add text
cix3 is offline  
Old 09-10-2009, 08:43 PM   #717
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,377
Karma: 27230406
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Just fetch the HTML and parse it looking for the print link
kovidgoyal is offline  
Old 09-10-2009, 08:49 PM   #718
cix3
Member
cix3 began at the beginning.
 
Posts: 14
Karma: 10
Join Date: Aug 2009
Device: Kindle 2
Quote:
Originally Posted by kovidgoyal View Post
Just fetch the HTML and parse it looking for the print link
Can you give me an example of a built-in recipe that does this?
cix3 is offline  
Old 09-10-2009, 09:21 PM   #719
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,377
Karma: 27230406
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Cant think of one off hand but basically, it's something like this

Code:
def get_article_url(self, article):
   url = ...(from article as before)
   soup = self.index_to_soup(url)
   # do some processing on soup to find the full article link
   a = soup.find(name='a', href=True, text=re.compile(r'Full\s*Article')
   if a is not None:
      return a['href']
   return url
Stick a few print statements in there to debug things
kovidgoyal is offline  
Old 09-10-2009, 11:28 PM   #720
cix3
Member
cix3 began at the beginning.
 
Posts: 14
Karma: 10
Join Date: Aug 2009
Device: Kindle 2
Quote:
Originally Posted by kovidgoyal View Post
Cant think of one off hand but basically, it's something like this

Code:
def get_article_url(self, article):
   url = ...(from article as before)
   soup = self.index_to_soup(url)
   # do some processing on soup to find the full article link
   a = soup.find(name='a', href=True, text=re.compile(r'Full\s*Article')
   if a is not None:
      return a['href']
   return url
Stick a few print statements in there to debug things

Hmmm... that's beyond my level of expertise. I'm going to have to wait for someone else to recommend a pre-built recipe that I can copy from.

Thanks!
cix3 is offline  
Closed Thread


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Custom column read ? pchrist7 Calibre 2 10-04-2010 02:52 AM
Archive for custom screensavers sleeplessdave Amazon Kindle 1 07-07-2010 12:33 PM
How to back up preferences and custom recipes? greenapple Calibre 3 03-29-2010 05:08 AM
Donations for Custom Recipes ddavtian Calibre 5 01-23-2010 04:54 PM
Help understanding custom recipes andersent Calibre 0 12-17-2009 02:37 PM


All times are GMT -4. The time now is 06:32 PM.


MobileRead.com is a privately owned, operated and funded community.