Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Development

Notices

Reply
 
Thread Tools Search this Thread
Old 09-10-2021, 09:02 AM   #1
oneillpt
Connoisseur
oneillpt began at the beginning.
 
Posts: 63
Karma: 46
Join Date: Feb 2011
Device: Kindle 3 (cracked screen!); PW1; Oasis
Modification to news.py to handle Unicode byte strings

I have just posted an updated recipe in the recipes forum for the Russian Аргументы и Факты. (#5 in https://www.mobileread.com/forums/sh...d.php?t=123726). This required modification to news.py to handle Unicode byte strings as well as str type. I'm posting these here as a suggested change which may help others who encounter file or directory names of type 'bytes'. I am not familiar enough with git to attempt a "merge directive".

1) in canonicalize_internal_url(self, url, is_link=True):
replace
return frozenset([(parts.netloc, (parts.path or '').rstrip('/'))])
by
zzp = parts.path
zzn = parts.netloc
if type(zzp) != type(' '): #"<class 'bytes'>":
zzp = parts.path.decode("utf-8")
zzn = parts.netloc.decode("utf-8")
return frozenset([(zzn, (zzp or '').rstrip('/'))])

2) In article_downloaded(self, request, result):
replace
index = os.path.join(os.path.dirname(result[0]), 'index.html')
by
zzr = result[0]
if type(zzr) != type(' '):
zzr = result[0].decode("utf-8")
index = os.path.join(os.path.dirname(zzr), 'index.html')
oneillpt is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
how to handle unicode chars in filenames in python? At_Libitum Development 3 10-18-2013 09:18 AM
'utf8' codec can't decode byte 0xb1 in position 18: invalid start byte paul.westland Calibre 19 10-11-2013 01:54 PM
ValueError: All strings must be XML compatible: Unicode or ASCII, no NULL by nimblebooks Conversion 5 11-04-2011 12:38 PM
Fetch News failing (All strings must be XML compatible nuveen Recipes 11 10-01-2011 12:01 PM
Malformed byte sequence: Invalid byte 2 of 3-byte UTF-8 sequence. Check encoding digireads ePub 3 04-26-2011 03:07 AM


All times are GMT -4. The time now is 07:58 AM.


MobileRead.com is a privately owned, operated and funded community.