|
|
#1 |
|
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 646
Karma: 85520
Join Date: May 2021
Device: kindle
|
India Today Magazine update
it stopped working.. they changed stuff in their website.
|
|
|
|
|
|
#2 |
|
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 646
Karma: 85520
Join Date: May 2021
Device: kindle
|
Indian Express
remove tags update.. they keep coming up with new tags.
|
|
|
|
| Advert | |
|
|
|
|
#3 |
|
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 646
Karma: 85520
Join Date: May 2021
Device: kindle
|
economic times print edition
cover url method update
|
|
|
|
|
|
#4 |
|
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 646
Karma: 85520
Join Date: May 2021
Device: kindle
|
eenadu_ap recipe cover_url update
https://github.com/kovidgoyal/calibr...nadu_ap.recipe Code:
def get_cover_url(self):
from datetime import date
cover = 'https://img.kiosko.net/' + str(
date.today().year
) + '/' + date.today().strftime('%m') + '/' + date.today(
).strftime('%d') + '/in/eenadu.750.jpg'
br = BasicNewsRecipe.get_browser(self)
try:
br.open(cover)
except:
index = 'https://es.kiosko.net/in/np/eenadu.html'
soup = self.index_to_soup(index)
for image in soup.findAll('img', src=True):
if image['src'].endswith('750.jpg'):
return 'https:' + image['src']
self.log("\nCover unavailable")
cover = None
return cover
|
|
|
|
|
|
#5 |
|
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 646
Karma: 85520
Join Date: May 2021
Device: kindle
|
India Today Magazine articles have long dashes between words which show up as —
adding encoding utf-8 didn't work. So I added.. Code:
def preprocess_raw_html(self, raw_html, url):
return raw_html.replace('—', '--')
maybe add the code to the recipe! also this bold part Code:
extra_css = '''
#sub-d {font-style:italic; color:#202020;}
.story__byline {font-size:small; text-align:left;}
.body_caption, .mos__alt, .caption, .caption-drupal-entity {font-size:small; text-align:center;}
blockquote{color:#404040;}
'''
Last edited by unkn0wn; 10-22-2022 at 04:37 AM. |
|
|
|
| Advert | |
|
|
|
|
#6 |
|
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,656
Karma: 28549046
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
If encoding utf-8 didnt work then they likely arent encoded in utf-8. So you will need the correct encoding. Common ones to try are cp1252 and latin1
|
|
|
|
|
|
#7 |
|
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 646
Karma: 85520
Join Date: May 2021
Device: kindle
|
I tried encoding = 'cp1252'
this makes the long dash show up as � and makes a lot of text unreadable. i think the replace solution is much better than figuring out encoding, the problem is only with em dash & they use a lot of them. also tried latin1 .. doesn;t work
Last edited by unkn0wn; 10-22-2022 at 03:04 PM. |
|
|
|
|
|
#8 |
|
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,656
Karma: 28549046
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
ok, fine by me.
|
|
|
|
![]() |
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Outlook Business Magazine (India) update | unkn0wn | Recipes | 0 | 08-13-2022 12:06 PM |
| Update India Today | unkn0wn | Recipes | 8 | 07-25-2022 05:11 AM |
| Caravan Magazine India Error In The New Update | abhix3 | Recipes | 2 | 07-18-2020 11:43 PM |
| Caravan Magazine India | abhix3 | Recipes | 8 | 07-01-2020 06:54 AM |
| Frontline Magazine India | Yash912 | Recipes | 0 | 01-06-2014 05:07 AM |