Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 08-31-2012, 05:13 AM   #1
zchar67
Junior Member
zchar67 began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Aug 2012
Device: Kindle
Financial Times UK - Missing UK Section

Hello.

I've been using the excellent FT UK recipe to fetch the news and read it on my kindle but recently the UK section no longer appears. The rest of the paper seems to be there though: Front Page, World, Comment, Companies etc. It's just that over the last few weeks the UK news section no longer is there. Can anyone help?

zchar67 is offline   Reply With Quote
Old 08-31-2012, 03:00 PM   #2
kiklop74
Guru
kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.
 
kiklop74's Avatar
 
Posts: 780
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle PaperWhite, Motorola Xoom
Send me your account username and password and I will take a look
kiklop74 is offline   Reply With Quote
Old 09-01-2012, 10:05 PM   #3
rainrdx
Connoisseur
rainrdx can differentiate black from dark navy bluerainrdx can differentiate black from dark navy bluerainrdx can differentiate black from dark navy bluerainrdx can differentiate black from dark navy bluerainrdx can differentiate black from dark navy bluerainrdx can differentiate black from dark navy bluerainrdx can differentiate black from dark navy bluerainrdx can differentiate black from dark navy bluerainrdx can differentiate black from dark navy bluerainrdx can differentiate black from dark navy bluerainrdx can differentiate black from dark navy blue
 
Posts: 54
Karma: 13316
Join Date: Jul 2012
Device: iPad
This is my fix.
Hopefully it works

Code:
__license__   = 'GPL v3'
__copyright__ = '2010-2011, Darko Miletic <darko.miletic at gmail.com>'
'''
www.ft.com/uk-edition
'''

import datetime
from calibre.ptempfile import PersistentTemporaryFile
from calibre import strftime
from calibre.web.feeds.news import BasicNewsRecipe

class FinancialTimes(BasicNewsRecipe):
    title                 = 'Financial Times (UK)'
    __author__            = 'Darko Miletic'
    description           = "The Financial Times (FT) is one of the world's leading business news and information organisations, recognised internationally for its authority, integrity and accuracy."
    publisher             = 'The Financial Times Ltd.'
    category              = 'news, finances, politics, UK, World'
    oldest_article        = 2
    language              = 'en_GB'
    max_articles_per_feed = 250
    no_stylesheets        = True
    use_embedded_content  = False
    needs_subscription    = True
    encoding              = 'utf8'
    publication_type      = 'newspaper'
    articles_are_obfuscated = True
    temp_files              = []
    masthead_url          = 'http://im.media.ft.com/m/img/masthead_main.jpg'
    LOGIN                 = 'https://registration.ft.com/registration/barrier/login'
    LOGIN2                = 'http://media.ft.com/h/subs3.html'
    INDEX                 = 'http://www.ft.com/uk-edition'
    PREFIX                = 'http://www.ft.com'

    conversion_options = {
                          'comment'          : description
                        , 'tags'             : category
                        , 'publisher'        : publisher
                        , 'language'         : language
                        , 'linearize_tables' : True
                        }

    def get_browser(self):
        br = BasicNewsRecipe.get_browser()
        br.open(self.INDEX)
        if self.username is not None and self.password is not None:
            br.open(self.LOGIN2)
            br.select_form(name='loginForm')
            br['username'] = self.username
            br['password'] = self.password
            br.submit()
        return br

    keep_only_tags = [
                        dict(name='div', attrs={'class':['fullstory fullstoryHeader', 'ft-story-header']})
                       ,dict(name='div', attrs={'class':'standfirst'})
                       ,dict(name='div', attrs={'id'   :'storyContent'})
                       ,dict(name='div', attrs={'class':['ft-story-body','index-detail']})
                     ]
    remove_tags = [
                      dict(name='div', attrs={'id':'floating-con'})
                     ,dict(name=['meta','iframe','base','object','embed','link'])
                     ,dict(attrs={'class':['storyTools','story-package','screen-copy','story-package separator','expandable-image']})
                  ]
    remove_attributes = ['width','height','lang']

    extra_css = """
                body{font-family: Georgia,Times,"Times New Roman",serif}
                h2{font-size:large}
                .ft-story-header{font-size: x-small}
                .container{font-size:x-small;}
                h3{font-size:x-small;color:#003399;}
                .copyright{font-size: x-small}
                img{margin-top: 0.8em; display: block}
                .lastUpdated{font-family: Arial,Helvetica,sans-serif; font-size: x-small}
                .byline,.ft-story-body,.ft-story-header{font-family: Arial,Helvetica,sans-serif}
                """

    def get_artlinks(self, elem):
        articles = []
        count = 0
        for item in elem.findAll('a',href=True):
            count = count + 1
            if self.test and count > 2:
               return articles
            rawlink = item['href']
            if rawlink.startswith('http://'):
               url = rawlink
            else:
               url   = self.PREFIX + rawlink
            urlverified = self.browser.open_novisit(url).geturl() # resolve redirect.
            title = self.tag_to_string(item)
            date = strftime(self.timefmt)
            articles.append({
                              'title'      :title
                             ,'date'       :date
                             ,'url'        :urlverified
                             ,'description':''
                            })
        return articles

    def parse_index(self):
        feeds = []
        soup = self.index_to_soup(self.INDEX)
	dates= self.tag_to_string(soup.find('div', attrs={'class':'btm-links'}).find('div'))
	self.timefmt = ' [%s]'%dates
        wide = soup.find('div',attrs={'class':'wide'})
        if not wide:
           return feeds
        strest = wide.findAll('h3', attrs={'class':'section'})
        if not strest:
           return feeds
        st = wide.findAll('h4',attrs={'class':'section-no-arrow'})
        if st:
           st.extend(strest)
        count = 0
        for item in st:
            count = count + 1
            if self.test and count > 2:
               return feeds
            ftitle   = self.tag_to_string(item)
            self.report_progress(0, _('Fetching feed')+' %s...'%(ftitle))
            feedarts = self.get_artlinks(item.parent.ul)
            feeds.append((ftitle,feedarts))
        return feeds

    def preprocess_html(self, soup):
        items = ['promo-box','promo-title',
                 'promo-headline','promo-image',
                 'promo-intro','promo-link','subhead']
        for item in items:
            for it in soup.findAll(item):
                it.name = 'div'
                it.attrs = []
        for item in soup.findAll(style=True):
            del item['style']
        for item in soup.findAll('a'):
            limg = item.find('img')
            if item.string is not None:
               str = item.string
               item.replaceWith(str)
            else:
               if limg:
                  item.name = 'div'
                  item.attrs = []
               else:
                   str = self.tag_to_string(item)
                   item.replaceWith(str)
        for item in soup.findAll('img'):
            if not item.has_key('alt'):
               item['alt'] = 'image'
        return soup

    def get_cover_url(self):
       cdate = datetime.date.today()
       if cdate.isoweekday() == 7:           
		cdate -= datetime.timedelta(days=1)
       return cdate.strftime('http://specials.ft.com/vtf_pdf/%d%m%y_FRONT1_LON.pdf')

    def get_obfuscated_article(self, url):
        count = 0
        while (count < 10):
            try:
                response = self.browser.open(url)
                html = response.read()
                count = 10
            except:
                print "Retrying download..."
            count += 1        
        self.temp_files.append(PersistentTemporaryFile('_fa.html'))
        self.temp_files[-1].write(html)
        self.temp_files[-1].close()
        return self.temp_files[-1].name
rainrdx is offline   Reply With Quote
Old 09-03-2012, 07:45 AM   #4
zchar67
Junior Member
zchar67 began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Aug 2012
Device: Kindle
Thanks both for your help. I planned on trying out raindx's solution this morning. I was going to download today's copy and then try it again with raindx's edits but when I downloaded it this morning the first time it included a 'National' section. I checked the downloads from last week and they were all missing the UK/National section. I don't what happened over this weekend but it's seems to have sorted itself out. I might try your solution if it happens again. Many thanks though.
zchar67 is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Financial Times: Full Edition porfirio Recipes 4 01-22-2012 12:25 AM
Financial Times: no 'loginForm' satisficer Recipes 1 08-13-2011 01:48 PM
Financial Times St28 Calibre 2 07-05-2011 11:50 AM
Financial Times Error chainanim Recipes 0 10-02-2010 02:36 AM
Review in the Financial Times GodDamN Sony Reader 0 12-29-2006 03:12 PM


All times are GMT -4. The time now is 02:15 PM.


MobileRead.com is a privately owned, operated and funded community.