|
|
#1 |
|
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 645
Karma: 85520
Join Date: May 2021
Device: kindle
|
indian express update
reordered feeds and #'d some .. the output is already large.
remove_tags updated and some other stuff. https://github.com/kovidgoyal/calibr...express.recipe |
|
|
|
|
|
#2 |
|
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 645
Karma: 85520
Join Date: May 2021
Device: kindle
|
Live Mint update
Can we italicize unresolved links to differentiate between it and resolved links
I've tried something like this in postprocess_html.. it didn't work. (changes even resolved links too) Is there another way? Code:
def postprocess_html(self, soup):
for unresolved in soup.findAll('a', href=lambda x: x and x.startswith('http')):
unresolved['id'] = 'unres-d'
extra_css = '#unres-d{font-style:italic;}
|
|
|
|
| Advert | |
|
|
|
|
#3 |
|
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,617
Karma: 28549044
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
No, that processing is done after postprocess is run. You could do it my implementing postprocess_book in the recipe but that isnt so easy.
|
|
|
|
|
|
#4 | ||
|
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 645
Karma: 85520
Join Date: May 2021
Device: kindle
|
Indian Express
Quote:
Quote:
|
||
|
|
|
|
|
#5 |
|
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 645
Karma: 85520
Join Date: May 2021
Device: kindle
|
Live Mint update
https://github.com/kovidgoyal/calibr...ivemint.recipe
Code:
def preprocess_raw_html(self, raw, *a):
if '<script>var wsjFlag=true;</script>' in raw:
m = re.search(r'type="application/ld\+json">[^<]+?"@type": "NewsArticle"', raw)
raw1 = raw[m.start():]
raw1 = raw1.split('>', 1)[1].strip()
data = json.JSONDecoder().raw_decode(raw1)[0]
value = data['hasPart']['value']
body = data['articleBody'] + '</p> <p>' + re.sub(r'([a-z]\.|[0-9]\.)([A-Z])', r'\1 <p> \2', value)
body = '<div class="FirstEle"> <p>' + body + '</p> </div>'
raw = re.sub(r'<div class="FirstEle">([^}]*)</div>', body, raw)
return raw
else:
return raw
and this to extra_css = .summary{font-style:italic; color:#404040;} of same part. and resolve_internal_links = True Last edited by unkn0wn; 08-18-2022 at 04:23 AM. |
|
|
|
| Advert | |
|
|
|
|
#6 | ||
|
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 645
Karma: 85520
Join Date: May 2021
Device: kindle
|
Indian Express
https://github.com/kovidgoyal/calibr...express.recipe
remove lines 110-112 and replace with Code:
h1 = soup.find('h1')
if h1:
h2 = h1.findNext('h2')
if h2:
h2.name = 'p'
h2['id'] = 'sub-d'
Quote:
extra_css additions Quote:
Last edited by unkn0wn; 08-18-2022 at 04:20 AM. |
||
|
|
|
|
|
#7 |
|
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 645
Karma: 85520
Join Date: May 2021
Device: kindle
|
Nautilus update
https://github.com/kovidgoyal/calibr...autilus.recipe
Code:
def preprocess_html(self, soup):
for img in soup.findAll('img', attrs={'data-src': True}):
img['src'] = img['data-src'].split('?')[0]
for figcaption in soup.findAll('figcaption'):
figcaption['id']='fig-c'
for ul in soup.findAll('ul', attrs={'class':
['breadcrumb', 'article-list_item-byline', 'channel-article-author', 'article-author']}):
ul.name = 'span'
for li in ul.findAll('li'):
li.name = 'p'
return soup
Code:
extra_css = '''
.article-list_item-byline{font-size:small;}
blockquote{color:#404040; text-align:center;}
#fig-c{font-size:small;}
em{color:#202020;}
.breadcrumb{color:gray; font-size:small;}
.article-author{font-size:small;}
'''
Last edited by unkn0wn; 08-18-2022 at 06:07 AM. |
|
|
|
|
|
#8 |
|
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,617
Karma: 28549044
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Done and I suggest you just attach the modified recipe files, easier for you and me.
|
|
|
|
|
|
#9 |
|
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 645
Karma: 85520
Join Date: May 2021
Device: kindle
|
Okay.. i thought for small changes this would be easier.
|
|
|
|
![]() |
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Update Indian express | unkn0wn | Recipes | 15 | 06-11-2022 05:41 AM |
| Updated feeds for Indian Express | unkn0wn | Recipes | 2 | 01-27-2022 05:49 AM |
| Indian Express misses some articles | nikstar007 | Recipes | 1 | 08-30-2016 09:10 AM |
| daily express update | scissors | Recipes | 0 | 11-22-2014 04:18 AM |
| Indian Express Recipe | sexymax15 | Recipes | 0 | 06-16-2011 07:06 AM |