Modification to news.py to handle Unicode byte strings

oneillpt · 09-10-2021, 10:02 AM

I have just posted an updated recipe in the recipes forum for the Russian Аргументы и Факты. (#5 in https://www.mobileread.com/forums/sh...d.php?t=123726). This required modification to news.py to handle Unicode byte strings as well as str type. I'm posting these here as a suggested change which may help others who encounter file or directory names of type 'bytes'. I am not familiar enough with git to attempt a "merge directive".

1) in canonicalize_internal_url(self, url, is_link=True):
replace
return frozenset([(parts.netloc, (parts.path or '').rstrip('/'))])
by
zzp = parts.path
zzn = parts.netloc
if type(zzp) != type(' '): #"<class 'bytes'>":
zzp = parts.path.decode("utf-8")
zzn = parts.netloc.decode("utf-8")
return frozenset([(zzn, (zzp or '').rstrip('/'))])

2) In article_downloaded(self, request, result):
replace
index = os.path.join(os.path.dirname(result[0]), 'index.html')
by
zzr = result[0]
if type(zzr) != type(' '):
zzr = result[0].decode("utf-8")
index = os.path.join(os.path.dirname(zzr), 'index.html')

09-10-2021, 10:02 AM	#1
oneillpt Connoisseur Posts: 63 Karma: 46 Join Date: Feb 2011 Device: Kindle 3 (cracked screen!); PW1; Oasis	Modification to news.py to handle Unicode byte strings I have just posted an updated recipe in the recipes forum for the Russian Аргументы и Факты. (#5 in https://www.mobileread.com/forums/sh...d.php?t=123726). This required modification to news.py to handle Unicode byte strings as well as str type. I'm posting these here as a suggested change which may help others who encounter file or directory names of type 'bytes'. I am not familiar enough with git to attempt a "merge directive". 1) in canonicalize_internal_url(self, url, is_link=True): replace return frozenset([(parts.netloc, (parts.path or '').rstrip('/'))]) by zzp = parts.path zzn = parts.netloc if type(zzp) != type(' '): #"<class 'bytes'>": zzp = parts.path.decode("utf-8") zzn = parts.netloc.decode("utf-8") return frozenset([(zzn, (zzp or '').rstrip('/'))]) 2) In article_downloaded(self, request, result): replace index = os.path.join(os.path.dirname(result[0]), 'index.html') by zzr = result[0] if type(zzr) != type(' '): zzr = result[0].decode("utf-8") index = os.path.join(os.path.dirname(zzr), 'index.html')

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
how to handle unicode chars in filenames in python?	At_Libitum	Development	3	10-18-2013 10:18 AM
'utf8' codec can't decode byte 0xb1 in position 18: invalid start byte	paul.westland	Calibre	19	10-11-2013 02:54 PM
ValueError: All strings must be XML compatible: Unicode or ASCII, no NULL by	nimblebooks	Conversion	5	11-04-2011 01:38 PM
Fetch News failing (All strings must be XML compatible	nuveen	Recipes	11	10-01-2011 01:01 PM
Malformed byte sequence: Invalid byte 2 of 3-byte UTF-8 sequence. Check encoding	digireads	ePub	3	04-26-2011 04:07 AM