MobileRead Forums - View Single Post

meme · 04-26-2011, 04:22 PM

Attached is a test version of the next release.

Can those of you who were having problems with non-ascii characters test this version. It may be working in 1.5.7, but I'm not sure it was using the 'right' approach so I want to test some newer code. Ideally you shouldn't notice any difference...

Can you run View report, and see if there are any warnings at the top about being unable to convert the path to utf-8. If so, for that book, can you try using Edit Collection to create a new collection and then add that book and save and Restart your Kindle to see if the book is actually in the collection. This will test if the pathname has to be converted to utf-8 before being put into a collection. (you can make a backup of your collections.json file or use Restore Collections if there are issues)

There will be a warning at the top of the View report that says 'System Encoding' . If you report issues, I'll need that value. Anyone who has something other than 'utf-8' or 'mbcs', can you let me know? NiLuJe - that probably means you

Can you check if any titles or authors with special non-ascii characters show up correctly in Edit Collections and View report.

And of course check if your other books are showing up in your collections after you do a Restart. If they don't seem to be, run Edit Collections, create a new collection and put a book in it, then save. Then Restart your Kindle and see if its in the Collection. (use Restore collections if you have issues, and you can always install 1.5.7 again to back out of this version after restoring your file).

The cache file will be rebuilt when you run this version so your first run may be slower than usual (added the book's encoding to the cache).

There are 2 areas with problems with regards to unicode/non-ascii characters. One is making sure that the pathname I use to create the hash code that goes into the collection file is the right type. Its not clear if the Kindle must use utf-8 or just unicode. At the moment I just convert the pathname to unicode without forcing it to utf-8 (its not clear if it converts to utf-8 or your local encoding by default) and do a hash on that. I need to see if I have to force utf-8, which might mean for some pathnames ignoring or replacing characters that can't be put into utf8. Whether just doing a simple unicode/hash works will depend on the system its running on - what encoding the system is - and what pathnames you have.

The second is an issue with titles/authors being encoded in Windows Latin-1 (CP1252) inside the books themselves. The CP1252 encoding is a special Windows (its always Windows, argghhh) which mostly maps to utf-8 but fails miserably with certain characters (a smart quote for example - character 0x92). This is what broke the plugin before - my sort algorithm on titles/authors would fall over when it tried to compare cp1252 and utf-8 strings when the cp1252 strings had a problem character in them. I've modified the code to identify the encoding used for a Mobi book and if it was encoded with cp1252 I decode it correctly. There are still some circumstances where the characters don't always show correctly, but generally it seems ok.

I've also re-written the general sorting algorithmn to be faster and less of a mess as I had a lot of unnecessary code duplicating what python could do for me. This should speed up view collections a little, but should be more noticeable in Edit collections when sorting the columns - but probably not a dramatic difference.

And I've added an About menu option so you can verify the version of the plugin.

EDIT - file removed, 7b available

04-26-2011, 04:22 PM	#756
meme Sigil developer Posts: 1,274 Karma: 1101600 Join Date: Jan 2011 Location: UK Device: Kindle PW, K4 NT, K3, Kobo Touch	Test version 1.5.7a available Attached is a test version of the next release. Can those of you who were having problems with non-ascii characters test this version. It may be working in 1.5.7, but I'm not sure it was using the 'right' approach so I want to test some newer code. Ideally you shouldn't notice any difference... Can you run View report, and see if there are any warnings at the top about being unable to convert the path to utf-8. If so, for that book, can you try using Edit Collection to create a new collection and then add that book and save and Restart your Kindle to see if the book is actually in the collection. This will test if the pathname has to be converted to utf-8 before being put into a collection. (you can make a backup of your collections.json file or use Restore Collections if there are issues) There will be a warning at the top of the View report that says 'System Encoding' . If you report issues, I'll need that value. Anyone who has something other than 'utf-8' or 'mbcs', can you let me know? NiLuJe - that probably means you Can you check if any titles or authors with special non-ascii characters show up correctly in Edit Collections and View report. And of course check if your other books are showing up in your collections after you do a Restart. If they don't seem to be, run Edit Collections, create a new collection and put a book in it, then save. Then Restart your Kindle and see if its in the Collection. (use Restore collections if you have issues, and you can always install 1.5.7 again to back out of this version after restoring your file). The cache file will be rebuilt when you run this version so your first run may be slower than usual (added the book's encoding to the cache). There are 2 areas with problems with regards to unicode/non-ascii characters. One is making sure that the pathname I use to create the hash code that goes into the collection file is the right type. Its not clear if the Kindle must use utf-8 or just unicode. At the moment I just convert the pathname to unicode without forcing it to utf-8 (its not clear if it converts to utf-8 or your local encoding by default) and do a hash on that. I need to see if I have to force utf-8, which might mean for some pathnames ignoring or replacing characters that can't be put into utf8. Whether just doing a simple unicode/hash works will depend on the system its running on - what encoding the system is - and what pathnames you have. The second is an issue with titles/authors being encoded in Windows Latin-1 (CP1252) inside the books themselves. The CP1252 encoding is a special Windows (its always Windows, argghhh) which mostly maps to utf-8 but fails miserably with certain characters (a smart quote for example - character 0x92). This is what broke the plugin before - my sort algorithm on titles/authors would fall over when it tried to compare cp1252 and utf-8 strings when the cp1252 strings had a problem character in them. I've modified the code to identify the encoding used for a Mobi book and if it was encoded with cp1252 I decode it correctly. There are still some circumstances where the characters don't always show correctly, but generally it seems ok. I've also re-written the general sorting algorithmn to be faster and less of a mess as I had a lot of unnecessary code duplicating what python could do for me. This should speed up view collections a little, but should be more noticeable in Edit collections when sorting the columns - but probably not a dramatic difference. And I've added an About menu option so you can verify the version of the plugin. EDIT - file removed, 7b available Last edited by meme; 04-27-2011 at 03:20 AM.