Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Closed Thread
 
Thread Tools Search this Thread
Old 05-14-2010, 02:16 PM   #1921
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by mwheinz View Post
Yeah - I've been trying traverse the soup with this:

Code:
   def preprocess_html(self, soup):
        for item in soup.body:
            print 'MHEINZ: [[['
            print item
            print ']]] MHEINZ\n\n'
        return soup
I usually just do this:
Code:
   def preprocess_html(self, soup):
            print 'The soup is: ', soup
        return soup
The purpose is to just see the html and pick out what I want to remove.
Quote:
Overall, though, it looks like soup is parsing to a particular depth and then stopping - it looks like there's a vast blob of html that it is treating as a blob of text.
That's why I suggested using preprocess_regexps. You can pick any chunk of the "vast blob" out and discard it. BeautifulSoup does a great job of handling malformed html, but it's not perfect. Trying to discard junk based on tags presumes that the part you want to discard can be identified by tags. If it can't, you can use regexp based methods to match the start and end of the text blob you want to remove, with regex string matching, without regard to whether that blob is marked with tags.
Starson17 is offline  
Old 05-14-2010, 08:39 PM   #1922
sdow1
Connoisseur
sdow1 began at the beginning.
 
Posts: 55
Karma: 10
Join Date: Apr 2010
Location: new york city
Device: nook, ipad
I just wanted to jump in and thank folks for trying with the whole prospect thing. This is well above my computer language skills (which are limited to html/css), and I appreciate the effort.

Didn't realize what a can of worms I was opening though!
sdow1 is offline  
Advert
Old 05-16-2010, 05:40 AM   #1923
gambarini
Connoisseur
gambarini began at the beginning.
 
Posts: 98
Karma: 22
Join Date: Mar 2010
Device: IRiver Story, Ipod Touch, Android SmartPhone
Recipes

new recipe:
www.libero-news.it

italian daily newspaper


older recipe:
L'Espresso
italian weekly news
-- better viewing, now all feeds work, and 2 new feeds.
La Repubblica
-- better viewing, now all feeds work
, more efficient remove policy
Le Scienze
-- bettwer viewing, new feed
Attached Files
File Type: zip libero.zip (954 Bytes, 183 views)
File Type: zip l_espresso New.zip (1.5 KB, 176 views)
File Type: zip la_repubblica.zip (1.2 KB, 172 views)
File Type: zip lescienzenew.zip (1.2 KB, 184 views)
gambarini is offline  
Old 05-17-2010, 03:57 AM   #1924
yamadharma
Junior Member
yamadharma began at the beginning.
 
Posts: 2
Karma: 10
Join Date: May 2010
Device: lbook v3
Calibre not working with Instapaper fetch now

When Calibre fetches Instapaper, there is file generated and transferred successfully, but no content. The size of the file is 0.0 mb.
I think, Instapaper API changed.
yamadharma is offline  
Old 05-17-2010, 10:06 AM   #1925
kiklop74
Guru
kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.
 
kiklop74's Avatar
 
Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
Updated recipe for instapaper.com:
Attached Files
File Type: zip instapaper.zip (1.1 KB, 204 views)
kiklop74 is offline  
Advert
Old 05-17-2010, 11:37 AM   #1926
pablofunes
Junior Member
pablofunes began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Nov 2009
Device: kindle2
submitting a patched recipe for new york review of books

Hi Kovid & Calibre community,

I've repaired the "new york review of books" recipe - one of Calibre's core recipes. It was missing all article's titles because of a change in the nybooks.com HTML configuration.

Where should I submit the patch to?

Regards,

Pablo Funes

PS: The patch is very simple. Where it says

keep_only_tags = [dict(id='article-body')]

It should be instead,

keep_only_tags = [dict(id=['article-body','page-title'])]


Quote:
Originally Posted by kovidgoyal View Post
Since there have been a lot of custom recipe requests of late, I'm starting a sticky where they can be aggregated. Post requests for custom recipes here. Once you have a custom recipe that works well for you (please test both the LRF and EPUB versions), let me know and I'll include it into calibre so others can benefit from it as well.
pablofunes is offline  
Old 05-17-2010, 11:47 AM   #1927
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,771
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
@pablofunes: Thanks, I've applied your change.
kovidgoyal is offline  
Old 05-17-2010, 01:46 PM   #1928
gambarini
Connoisseur
gambarini began at the beginning.
 
Posts: 98
Karma: 22
Join Date: Mar 2010
Device: IRiver Story, Ipod Touch, Android SmartPhone
New recipe

infomotori

Italian Car and MotorCicle News
Attached Files
File Type: zip infomotori.zip (1.0 KB, 187 views)
gambarini is offline  
Old 05-17-2010, 05:10 PM   #1929
mwheinz
award-winning bozo
mwheinz can program the VCR without an owner's manual.mwheinz can program the VCR without an owner's manual.mwheinz can program the VCR without an owner's manual.mwheinz can program the VCR without an owner's manual.mwheinz can program the VCR without an owner's manual.mwheinz can program the VCR without an owner's manual.mwheinz can program the VCR without an owner's manual.mwheinz can program the VCR without an owner's manual.mwheinz can program the VCR without an owner's manual.mwheinz can program the VCR without an owner's manual.mwheinz can program the VCR without an owner's manual.
 
Posts: 258
Karma: 172703
Join Date: Sep 2009
Location: Philadelphia
Device: Kobo Libra 2
American Prospect Recipe

American Prospect Recipe

sdow1 - try this recipe. It's very simple, strips out all formatting at the moment.

Code:
import re

class AdvancedUserRecipe1273850169(BasicNewsRecipe):
    title          = u'American Prospect'
    oldest_article = 7
    max_articles_per_feed = 100
    recursions = 0
    no_stylesheets = True
    remove_javascript = True

    keep_only_tags = [dict(name=['p','img'])]
	
    preprocess_regexps = [ 
        (re.compile('\r'),lambda match: ''),
        (re.compile(r'<head.*?<title>', re.DOTALL|re.IGNORECASE), lambda match: '<head><title>'),
        (re.compile(r'</title>.*?</head>', re.DOTALL|re.IGNORECASE), lambda match: '</title></head>'),
        (re.compile(r'<body.*?<div class="pad_10L10R">', re.DOTALL|re.IGNORECASE), lambda match: '<body><div>'),
        (re.compile(r'</div>.*</body>', re.DOTALL|re.IGNORECASE), lambda match: '</div></body>'),
    ]

    feeds       = [(u'Articles', u'feed://www.prospect.org/articles_rss.jsp')]

Last edited by mwheinz; 05-17-2010 at 07:44 PM.
mwheinz is offline  
Old 05-18-2010, 07:38 AM   #1930
sdow1
Connoisseur
sdow1 began at the beginning.
 
Posts: 55
Karma: 10
Join Date: Apr 2010
Location: new york city
Device: nook, ipad
mwheinz:

That looks like it works!

Thanks so much for the help
sdow1 is offline  
Old 05-18-2010, 12:47 PM   #1931
sdow1
Connoisseur
sdow1 began at the beginning.
 
Posts: 55
Karma: 10
Join Date: Apr 2010
Location: new york city
Device: nook, ipad
Quote:
Originally Posted by mwheinz View Post
American Prospect Recipe

sdow1 - try this recipe. It's very simple, strips out all formatting at the moment.

Code:
import re

class AdvancedUserRecipe1273850169(BasicNewsRecipe):
    title          = u'American Prospect'
    oldest_article = 7
    max_articles_per_feed = 100
    recursions = 0
    no_stylesheets = True
    remove_javascript = True

    keep_only_tags = [dict(name=['p','img'])]
	
    preprocess_regexps = [ 
        (re.compile('\r'),lambda match: ''),
        (re.compile(r'<head.*?<title>', re.DOTALL|re.IGNORECASE), lambda match: '<head><title>'),
        (re.compile(r'</title>.*?</head>', re.DOTALL|re.IGNORECASE), lambda match: '</title></head>'),
        (re.compile(r'<body.*?<div class="pad_10L10R">', re.DOTALL|re.IGNORECASE), lambda match: '<body><div>'),
        (re.compile(r'</div>.*</body>', re.DOTALL|re.IGNORECASE), lambda match: '</div></body>'),
    ]

    feeds       = [(u'Articles', u'feed://www.prospect.org/articles_rss.jsp')]
In looking at this further, the only thing I'd change for now is to change the oldest article limit (to 30), since TAP is a monthly magazine. I can do this myself on my copy, but just wanted to put it out there for anyone else.
sdow1 is offline  
Old 05-18-2010, 01:20 PM   #1932
mwheinz
award-winning bozo
mwheinz can program the VCR without an owner's manual.mwheinz can program the VCR without an owner's manual.mwheinz can program the VCR without an owner's manual.mwheinz can program the VCR without an owner's manual.mwheinz can program the VCR without an owner's manual.mwheinz can program the VCR without an owner's manual.mwheinz can program the VCR without an owner's manual.mwheinz can program the VCR without an owner's manual.mwheinz can program the VCR without an owner's manual.mwheinz can program the VCR without an owner's manual.mwheinz can program the VCR without an owner's manual.
 
Posts: 258
Karma: 172703
Join Date: Sep 2009
Location: Philadelphia
Device: Kobo Libra 2
American Prospect, Politifact, Factcheck

@Sdow1 - thanks for the tip, I don't normally read AP.

@everybody Here's a bundle of 3 "political" recipes - the American Prospect, Factcheck and Politifact.
Attached Files
File Type: gz political_recipes.tar.gz (986 Bytes, 182 views)

Last edited by mwheinz; 05-18-2010 at 01:32 PM.
mwheinz is offline  
Old 05-18-2010, 06:40 PM   #1933
mlstein
Enthusiast
mlstein knows what time it ismlstein knows what time it ismlstein knows what time it ismlstein knows what time it ismlstein knows what time it ismlstein knows what time it ismlstein knows what time it ismlstein knows what time it ismlstein knows what time it ismlstein knows what time it ismlstein knows what time it is
 
Posts: 49
Karma: 2062
Join Date: May 2010
Device: iPad (one)
http://www.tomdispatch.com/

I can't figure out how to get through feedburner to the google feed to the ctual articles...
mlstein is offline  
Old 05-18-2010, 08:25 PM   #1934
mwheinz
award-winning bozo
mwheinz can program the VCR without an owner's manual.mwheinz can program the VCR without an owner's manual.mwheinz can program the VCR without an owner's manual.mwheinz can program the VCR without an owner's manual.mwheinz can program the VCR without an owner's manual.mwheinz can program the VCR without an owner's manual.mwheinz can program the VCR without an owner's manual.mwheinz can program the VCR without an owner's manual.mwheinz can program the VCR without an owner's manual.mwheinz can program the VCR without an owner's manual.mwheinz can program the VCR without an owner's manual.
 
Posts: 258
Karma: 172703
Join Date: Sep 2009
Location: Philadelphia
Device: Kobo Libra 2
mlstein,

Try this:

Code:
class TomDispatch(BasicNewsRecipe):
    title          = u'TomDispatch'
    __author__     = u'Michael Heinz'
    oldest_article = 21
    max_articles_per_feed = 100
    recursion = 2
    use_embedded_content = False
    no_stylesheets = True

    publication_type = 'magazine'
    masthead_url = 'http://www.tomdispatch.com/application/images/site/tomdispatch_logo_v1.gif'
    cover_url = 'http://www.tomdispatch.com/application/images/site/tomdispatch_logo_v1.gif'

    remove_tags = [ 
                     dict(name='div', attrs={'id':'postSideBar'}),
                  ]

    keep_only_tags = [dict(name='div', attrs={'id':'mainWide'})]
    
    feeds = [
              (u'Articles', u'feed://feeds.feedburner.com/tomdispatch/esUU'),
            ]

    def get_article_url(self, article):
        return article.get('feedburner_origlink', None)
mwheinz is offline  
Old 05-18-2010, 08:41 PM   #1935
hito1
Junior Member
hito1 began at the beginning.
 
Posts: 1
Karma: 10
Join Date: May 2010
Device: Kindle
I'm new here, so I'm sorry if I'm not doing this right.

I couldn't find any recipe for Proceedings or Naval History magazines, they both have a free section that requires a registration:

http://www.usni.org/magazines/proceedings/index.asp

http://www.usni.org/magazines/navalhistory/index.asp


Thanks a lot.

-----------
Beside that request, I'd like to thank the The Economist (free) and the Foreign Affair (subscription) recipes, both worked pretty good on my Kindle.
hito1 is offline  
Closed Thread

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Custom column read ? pchrist7 Calibre 2 10-04-2010 02:52 AM
Archive for custom screensavers sleeplessdave Amazon Kindle 1 07-07-2010 12:33 PM
How to back up preferences and custom recipes? greenapple Calibre 3 03-29-2010 05:08 AM
Donations for Custom Recipes ddavtian Calibre 5 01-23-2010 04:54 PM
Help understanding custom recipes andersent Calibre 0 12-17-2009 02:37 PM


All times are GMT -4. The time now is 05:40 AM.


MobileRead.com is a privately owned, operated and funded community.