help for the recipe of globaltimes

alan smith · 04-23-2019, 08:49 PM

calibre is so powerful, but recently i find there is something wrong with the recipe of globaltimes(doesn't work).I was wondering if someone could make some updates?the site adreess is
http://www.globaltimes.cn/
The version of the software I'm using is 3.41 ,thanks a lot.

lui1 · 04-24-2019, 12:59 AM

It looks like the links were broken and the structure of the markup changed. So it needed to be redone.

Update for Global News:

Code:

from calibre.web.feeds.news import BasicNewsRecipe
import re


def check_words(words):
    return lambda x: x and frozenset(words.split()).intersection(x.split())


class GlobalTimes(BasicNewsRecipe):
    title = u'Global Times'
    __author__ = 'Jose Ortiz' # lui1 at mobileread.com
    language = 'en_CN'
    oldest_article = 7
    max_articles_per_feed = 100
    no_stylesheets = True
    keep_only_tags = [
        { 'class': check_words('article-title article-source row-content') },
    ]

    extra_css = '''
        .article-title {
            font-family:Arial,Helvetica,sans-serif;
            font-weight:bold;font-size:large;
        }

        .article-source, .row-content {
            font-family:Arial,Helvetica,sans-serif;
            font-size:small;
        }
        '''

    def parse_index(self):
        catnames = {}
        catnames["http://www.globaltimes.cn/china/politics/"] = "China Politics"
        catnames["http://www.globaltimes.cn/china/diplomacy/"] = "China Diplomacy"
        catnames["http://www.globaltimes.cn/china/military/"] = "China Military"
        catnames["http://www.globaltimes.cn/business/economy/"] = "China Economy"
        catnames["http://www.globaltimes.cn/world/asia-pacific/"] = "Asia Pacific"
        feeds = []

        for cat in catnames.keys():
            articles = []
            soup = self.index_to_soup(cat)
            for a in soup.findAll('a', attrs={'href': re.compile(r'https?://www.globaltimes.cn/content/[0-9]{4,10}[.]shtml')}):
                url = a['href'].strip()                # Typical url http://www.globaltimes.cn/content/5555555.shtml
                title = self.tag_to_string(a).strip()
                if not title:
                    continue
                myarticle = ({'title': title,
                              'url': url,
                              'description': '',
                              'date': ''})
                self.log("found '%s'" % title)
                articles.append(myarticle)
                self.log("Adding URL %s\n" % url)
            if articles:
                feeds.append((catnames[cat], articles))
        return feeds

alan smith · 04-24-2019, 05:45 AM

Your help is greatly appreciated！I just tested it and found the recipe workes perfectly, but there are too many blank between paragraphs in the article, which affected the reading experience.It would be perfect if this problem could be fixed ! thanks again.

lui1 · 04-24-2019, 03:20 PM

Your welcome

. The problem is that they are using br line breaks instead of p elements to seperate there paragraphs. This update should improve the layout.

Update for Global Times:

Code:

from calibre.web.feeds.news import BasicNewsRecipe
import re


def classes(classes):
    q = frozenset(classes.split(' '))
    return dict(
        attrs={'class': lambda x: x and frozenset(x.split()).intersection(q)}
    )


class GlobalTimes(BasicNewsRecipe):
    title = u'Global Times'
    __author__ = 'Jose Ortiz'  # lui1 at mobileread.com
    language = 'en_CN'
    oldest_article = 7
    max_articles_per_feed = 100
    no_stylesheets = True
    keep_only_tags = [classes('article-title article-source row-content')]

    preprocess_regexps = [
        (re.compile(r'(?:<(?:br(?:\s*/)?|/br\s*)>(?:\s|'
                    '\xA0' r'|&nbsp;)*){2,9}',
                    re.U | re.I),
         lambda match: '<p>')
    ]

    extra_css = '''
        :root {
            font-family: Arial, Helvetica, sans-serif;
        }

        .article-title {
            font-weight: bold;
            font-size: large;
        }

        .article-source, .row-content {
            font-size:small;
        }
        '''

    def parse_index(self):
        catnames = {}
        catnames["http://www.globaltimes.cn/china/politics/"] = "China Politics"
        catnames["http://www.globaltimes.cn/china/diplomacy/"] = "China Diplomacy"
        catnames["http://www.globaltimes.cn/china/military/"] = "China Military"
        catnames["http://www.globaltimes.cn/business/economy/"] = "China Economy"
        catnames["http://www.globaltimes.cn/world/asia-pacific/"] = "Asia Pacific"
        feeds = []

        for cat in catnames.keys():
            articles = []
            soup = self.index_to_soup(cat)
            for a in soup.findAll(
                'a',
                attrs={
                    'href':
                    re.compile(
                        r'https?://www.globaltimes.cn/content/[0-9]{4,10}[.]shtml'
                    )
                }
            ):
                # Typical url http://www.globaltimes.cn/content/5555555.shtml
                url = a['href'].strip()
                title = self.tag_to_string(a).strip()
                if not title:
                    continue
                myarticle = ({
                    'title': title,
                    'url': url,
                    'description': '',
                    'date': ''
                })
                self.log("found '%s'" % title)
                articles.append(myarticle)
                self.log("Adding URL %s\n" % url)
            if articles:
                feeds.append((catnames[cat], articles))
        return feeds

    def postprocess_html(self, soup, first_fetch):
        for p in [p for p in soup('p') if len(p) == 0]:
            p.extract()
        return soup

alan smith · 04-24-2019, 09:48 PM

Thanks so much, and thanks for responding so quickly.

04-23-2019, 08:49 PM	#1
alan smith Junior Member Posts: 7 Karma: 10 Join Date: Apr 2019 Device: kindle	help for the recipe of globaltimes calibre is so powerful, but recently i find there is something wrong with the recipe of globaltimes(doesn't work).I was wondering if someone could make some updates?the site adreess is http://www.globaltimes.cn/ The version of the software I'm using is 3.41 ,thanks a lot.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Per-recipe settings without editing the recipe?	bobbysteel	Recipes	3	03-05-2017 07:40 AM
Recipe for Het Laatste Nieuws (Belgian newspaper) based on built in recipe of Darko M	erkfuizfeuadjfjz	Recipes	0	02-17-2017 03:11 PM
Recipe voor De Tijd (Belgian newspaper) based on built in recipe of Darko Miletic	erkfuizfeuadjfjz	Recipes	0	02-17-2017 02:43 PM
ft recipe financial_times_us.recipe	piet8stevens	Recipes	3	03-05-2016 03:55 AM
Recipe works when mocked up as Python file, fails when converted to Recipe	ode	Recipes	7	09-04-2011 04:57 AM

04-24-2019, 05:45 AM	#3
alan smith Junior Member Posts: 7 Karma: 10 Join Date: Apr 2019 Device: kindle	Your help is greatly appreciated！I just tested it and found the recipe workes perfectly, but there are too many blank between paragraphs in the article, which affected the reading experience.It would be perfect if this problem could be fixed ! thanks again.

04-24-2019, 09:48 PM	#5
alan smith Junior Member Posts: 7 Karma: 10 Join Date: Apr 2019 Device: kindle	Thanks so much, and thanks for responding so quickly.