Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 07-21-2022, 10:20 AM   #1
unkn0wn
Evangelist
unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.
 
Posts: 444
Karma: 82686
Join Date: May 2021
Device: kindle
indian express update

reordered feeds and #'d some .. the output is already large.
remove_tags updated and some other stuff.
https://github.com/kovidgoyal/calibr...express.recipe
Attached Files
File Type: recipe Indian Express.recipe (4.0 KB, 75 views)
unkn0wn is offline   Reply With Quote
Old 07-22-2022, 02:28 AM   #2
unkn0wn
Evangelist
unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.
 
Posts: 444
Karma: 82686
Join Date: May 2021
Device: kindle
Live Mint update

Can we italicize unresolved links to differentiate between it and resolved links

I've tried something like this in postprocess_html.. it didn't work. (changes even resolved links too) Is there another way?
Code:
def postprocess_html(self, soup):
        for unresolved in soup.findAll('a', href=lambda x: x and x.startswith('http')):
            unresolved['id'] = 'unres-d'
extra_css = '#unres-d{font-style:italic;}
I think this should be done after calibre stitches together all those html files it fetched..
Attached Files
File Type: recipe Live Mint.recipe (4.3 KB, 69 views)
unkn0wn is offline   Reply With Quote
Old 07-22-2022, 04:00 AM   #3
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,856
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
No, that processing is done after postprocess is run. You could do it my implementing postprocess_book in the recipe but that isnt so easy.
kovidgoyal is offline   Reply With Quote
Old 08-11-2022, 07:53 AM   #4
unkn0wn
Evangelist
unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.
 
Posts: 444
Karma: 82686
Join Date: May 2021
Device: kindle
Indian Express
Quote:
extra_css
add blockquote{text-align:center; color:#404040;}

to remove_tags
add dict(name='div', attrs={'class': lambda x: x and 'related-widget' in x}),
and' ie-first-publish adboxtop adsizes' to classes part.
EENADU indian language recipe update.
Quote:
remove https://github.com/kovidgoyal/calibr.../eenadu.recipe

and add attached files.. they are updated and separated as per different states(provinces).
unkn0wn is offline   Reply With Quote
Old 08-18-2022, 01:44 AM   #5
unkn0wn
Evangelist
unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.
 
Posts: 444
Karma: 82686
Join Date: May 2021
Device: kindle
Live Mint update

https://github.com/kovidgoyal/calibr...ivemint.recipe

Code:
        def preprocess_raw_html(self, raw, *a):
            if '<script>var wsjFlag=true;</script>' in raw:
                m = re.search(r'type="application/ld\+json">[^<]+?"@type": "NewsArticle"', raw)
                raw1 = raw[m.start():]
                raw1 = raw1.split('>', 1)[1].strip()
                data = json.JSONDecoder().raw_decode(raw1)[0]
                value = data['hasPart']['value']
                body = data['articleBody'] + '</p> <p>' + re.sub(r'([a-z]\.|[0-9]\.)([A-Z])', r'\1 <p> \2', value)
                body = '<div class="FirstEle"> <p>' +  body  + '</p> </div>'
                raw = re.sub(r'<div class="FirstEle">([^}]*)</div>', body, raw)
                return raw
            else:
                return raw
add this to the else: part (non saturday).. some wsj articles wont load.

and this to extra_css = .summary{font-style:italic; color:#404040;} of same part.
and resolve_internal_links = True

Last edited by unkn0wn; 08-18-2022 at 03:23 AM.
unkn0wn is offline   Reply With Quote
Old 08-18-2022, 01:58 AM   #6
unkn0wn
Evangelist
unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.
 
Posts: 444
Karma: 82686
Join Date: May 2021
Device: kindle
Indian Express

https://github.com/kovidgoyal/calibr...express.recipe

remove lines 110-112 and replace with
Code:
        h1 = soup.find('h1')
        if h1:
            h2 = h1.findNext('h2')
            if h2:
                h2.name = 'p' 
                h2['id'] = 'sub-d'
and add to remove_tags
Quote:
dict(name='div', attrs={'class': lambda x: x and 'related-widget' in x}),
and 'immigrationimg' to remove_tags classes

extra_css additions
Quote:
em{font-style:italic; color:#808080;}
#sub-d{color:#202020; font-style:italic;}

Last edited by unkn0wn; 08-18-2022 at 03:20 AM.
unkn0wn is offline   Reply With Quote
Old 08-18-2022, 02:13 AM   #7
unkn0wn
Evangelist
unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.
 
Posts: 444
Karma: 82686
Join Date: May 2021
Device: kindle
Nautilus update

https://github.com/kovidgoyal/calibr...autilus.recipe

Code:
    def preprocess_html(self, soup):
        for img in soup.findAll('img', attrs={'data-src': True}):
            img['src'] = img['data-src'].split('?')[0]
        for figcaption in soup.findAll('figcaption'):
            figcaption['id']='fig-c'
        for ul in soup.findAll('ul', attrs={'class':
            ['breadcrumb', 'article-list_item-byline', 'channel-article-author', 'article-author']}):
            ul.name = 'span'
            for li in ul.findAll('li'):
                li.name = 'p'
        return soup
Code:
    extra_css = '''
        .article-list_item-byline{font-size:small;}
        blockquote{color:#404040; text-align:center;}
        #fig-c{font-size:small;}
        em{color:#202020;}
        .breadcrumb{color:gray; font-size:small;}
        .article-author{font-size:small;}
    '''
and add 'article-collection_box' to remove_tags classes

Last edited by unkn0wn; 08-18-2022 at 05:07 AM.
unkn0wn is offline   Reply With Quote
Old 08-18-2022, 08:21 AM   #8
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,856
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Done and I suggest you just attach the modified recipe files, easier for you and me.
kovidgoyal is offline   Reply With Quote
Old 08-18-2022, 09:46 AM   #9
unkn0wn
Evangelist
unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.
 
Posts: 444
Karma: 82686
Join Date: May 2021
Device: kindle
Okay.. i thought for small changes this would be easier.
unkn0wn is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Update Indian express unkn0wn Recipes 15 06-11-2022 04:41 AM
Updated feeds for Indian Express unkn0wn Recipes 2 01-27-2022 04:49 AM
Indian Express misses some articles nikstar007 Recipes 1 08-30-2016 08:10 AM
daily express update scissors Recipes 0 11-22-2014 03:18 AM
Indian Express Recipe sexymax15 Recipes 0 06-16-2011 06:06 AM


All times are GMT -4. The time now is 06:17 PM.


MobileRead.com is a privately owned, operated and funded community.