Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 07-07-2011, 04:43 AM   #1
throshlyak
Junior Member
throshlyak began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Jul 2011
Device: kindle3
Recipe for capital.bg

Hello from Bulgaria,

This is my first attempt to create a custom recipe for a news feed. It is for a bulgarian newspaper www.capital.bg

Code:
from calibre.ptempfile import PersistentTemporaryFile

class Kapital(BasicNewsRecipe):
    title                 = u'\u041a\u0410\u041f\u0418\u0422\u0410\u041b'
    __author__            = 'Troshlyak'
    __version__           = 'v0.02'
    __date__              = '07, July 2011'
    no_stylesheets        = True
    extra_css             = 'h1 {font: sans-serif large;}\n.byline {font:monospace;}'
    encoding              = 'utf8'
    masthead_url          = 'http://www.capital.bg/i/capital_logo.png'
    remove_javascript     = True
    oldest_article        = 7
    max_articles_per_feed = 100

    feeds = [(u'\u041f\u043e\u043b\u0438\u0442\u0438\u043a\u0430', u'http://www.capital.bg/rss/?rubrid=2248'), 
             (u'\u0411\u0438\u0437\u043d\u0435\u0441', u'http://www.capital.bg/rss/?rubrid=2267'), 
             (u'\u0418\u043d\u0442\u0435\u0440\u0430\u043a\u0442\u0438\u0432', u'http://www.capital.bg/rss/?rubrid=2337')]
             
    remove_tags_before = dict(name='div', attrs={'class':'printwrapper'})
    remove_tags_after = dict(name='div', attrs={'class':'printwrapper'})
    remove_tags = [dict(name='p', attrs={'class':'photoInfo'})]

    temp_files = []
    articles_are_obfuscated = True

    def get_obfuscated_article(self, url):
        br = self.get_browser()
        
        br.open(url)
        '''
             we need to use a try catch block:
             what this does is trys to do an operation and if it fails instead of crashing it simply catchs it and does
             something with the error.
             So in our case we take and check to see if we can follow /content/printVersion, then if we can't
             then we simply pass it back the original calling url 
        '''
        
        try:
            response = br.follow_link(url_regex='.*?(\\/printversion\\.php\\?)', nr = 0)
            html = response.read()
        except:
            response = br.open(url)
            html = response.read()
         
        self.temp_files.append(PersistentTemporaryFile('_fa.html'))
        self.temp_files[-1].write(html)
        self.temp_files[-1].close()
        return self.temp_files[-1].name
The only thing that still bugs me is that at the end of each article in the printed version there is a nice image that I would like to move in front of the article below the title.

Can someone help me with this. Here is a sample article from the site itself:
http://www.capital.bg/printversion.php?storyid=1119112
throshlyak is offline   Reply With Quote
Old 07-07-2011, 10:34 AM   #2
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by throshlyak View Post
The only thing that still bugs me is that at the end of each article in the printed version there is a nice image that I would like to move in front of the article below the title.

Can someone help me with this.
If you can restructure your recipe with keep_only_tags and list the image tag first, then the other tags you want to keep, that will cause a reordering. Alternatively, you can use BeautifulSoup and postprocess_html to reshuffle the tags.
Starson17 is offline   Reply With Quote
Advert
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
recipe for Capital.de - german schuster Recipes 1 05-01-2016 04:45 AM
lower case to capital tscamera Sigil 12 03-18-2012 10:54 PM
Hi from the Capital of the World...Belfast brasco Introduce Yourself 6 01-11-2011 12:23 PM
Help With Capital Letter Going Astray Marcy Calibre 3 08-10-2010 11:38 AM
Hi from the ex capital of culture snickp Introduce Yourself 14 01-30-2009 02:27 PM


All times are GMT -4. The time now is 05:39 PM.


MobileRead.com is a privately owned, operated and funded community.