Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 09-29-2010, 04:13 PM   #1
t3d
Enthusiast
t3d began at the beginning.
 
Posts: 38
Karma: 10
Join Date: Nov 2009
Location: Poland
Device: kindle 1st gen, kindle dxg, kindle paperwhite2
how to filter feeds

Hello!

I have some recipes that are almost ready to publish, but there are some articles that won't work on e-readers. I want to filter them out by URL. It should be easy, as their URLs contains some unique strings. Here is my try, that depicts the idea, but doesn't work at all:

Code:
    def get_article_url(self, article): 
        link = article.get('link')
        audio = link.find('audio')
        if not audio:
            return link
I am not familiar with python, so I am not sure if it should have something like "return NULL" when the string is found or not
t3d is offline   Reply With Quote
Old 09-29-2010, 04:18 PM   #2
TonytheBookworm
Addict
TonytheBookworm is on a distinguished road
 
TonytheBookworm's Avatar
 
Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
Quote:
Originally Posted by t3d View Post
Hello!

I have some recipes that are almost ready to publish, but there are some articles that won't work on e-readers. I want to filter them out by URL. It should be easy, as their URLs contains some unique strings. Here is my try, that depicts the idea, but doesn't work at all:

Code:
    def get_article_url(self, article): 
        link = article.get('link')
        audio = link.find('audio')
        if not audio:
            return link
I am not familiar with python, so I am not sure if it should have something like "return NULL" when the string is found or not
Try something like this:
Spoiler:

Code:
 def preprocess_html(self, soup) :
        
        weblinks = soup.findAll(['a'])
        if weblinks is not None:
            for link in weblinks:
                if re.search('audio',str(link)):
                  
                  link.parent.extract()
        return soup

Last edited by TonytheBookworm; 09-29-2010 at 04:20 PM.
TonytheBookworm is offline   Reply With Quote
Advert
Old 09-29-2010, 04:20 PM   #3
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,839
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
You want

Code:
def get_article_url(self, article): 
        link = article.get('link')
        if 'audio' not in link:
             return link
kovidgoyal is offline   Reply With Quote
Old 09-29-2010, 04:39 PM   #4
t3d
Enthusiast
t3d began at the beginning.
 
Posts: 38
Karma: 10
Join Date: Nov 2009
Location: Poland
Device: kindle 1st gen, kindle dxg, kindle paperwhite2
Quote:
Originally Posted by kovidgoyal View Post
You want

Code:
def get_article_url(self, article): 
        link = article.get('link')
        if 'audio' not in link:
             return link
Working well and already implemented:
http://github.com/t3d/kalibrator/com...3ad755eb6e597e

Thanks!
t3d is offline   Reply With Quote
Old 09-29-2010, 04:56 PM   #5
TonytheBookworm
Addict
TonytheBookworm is on a distinguished road
 
TonytheBookworm's Avatar
 
Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
Quote:
Originally Posted by kovidgoyal View Post
You want

Code:
def get_article_url(self, article): 
        link = article.get('link')
        if 'audio' not in link:
             return link
Kovid,
Thanks for showing that method. I have been doing the preprocess one for a while like with the popscience.recipe but your method is cleaner and faster. Thanks again. I will be updating the popscience for submission in a future build so it implements this. That is one thing I love about this forum and calibre. I learn something new every single day by folks helping others.
TonytheBookworm is offline   Reply With Quote
Advert
Old 09-29-2010, 05:02 PM   #6
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,839
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
@TtB: That emthod is for recipes that download articles from RSS feeds. It filters items from the RSS feed. It is not meant for filtering links in the actual article
kovidgoyal is offline   Reply With Quote
Old 09-29-2010, 05:15 PM   #7
TonytheBookworm
Addict
TonytheBookworm is on a distinguished road
 
TonytheBookworm's Avatar
 
Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
Quote:
Originally Posted by kovidgoyal View Post
@TtB: That emthod is for recipes that download articles from RSS feeds. It filters items from the RSS feed. It is not meant for filtering links in the actual article
I was wondering what the difference was. thank you for explaining that.
TonytheBookworm is offline   Reply With Quote
Old 10-03-2010, 09:29 AM   #8
t3d
Enthusiast
t3d began at the beginning.
 
Posts: 38
Karma: 10
Join Date: Nov 2009
Location: Poland
Device: kindle 1st gen, kindle dxg, kindle paperwhite2
@Kovid
I didn't expect you would immediately add my recipe
Actually I recently improved the string used for filtering so that eg. article with 'audiobook' in title wouldn't be ommited

Could you please update it with this revision: http://github.com/t3d/kalibrator/raw..._opinie.recipe

It is the same source as rmf24_fakty.recipe and rmf24_ESKN.recipe, so please link it to the same favicon. I sometimes split one source into more recipes to prevent ebook files from growing into files too big for ebook devices, and separate not related topics.
t3d is offline   Reply With Quote
Old 10-03-2010, 12:45 PM   #9
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,839
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
done .
kovidgoyal is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
My Smashwords ebook filter Richard Herley Reading Recommendations 10 11-28-2010 09:42 AM
TOC Filter Help???? erics1019 Calibre 1 06-10-2010 01:53 PM
Filter question edbro Calibre 2 01-20-2010 08:44 PM


All times are GMT -4. The time now is 06:53 PM.


MobileRead.com is a privately owned, operated and funded community.