View Single Post
Old 01-15-2017, 06:20 AM   #214
Leonatus
Wizard
Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.
 
Leonatus's Avatar
 
Posts: 1,060
Karma: 11391181
Join Date: Mar 2013
Location: Guben, Brandenburg, Germany
Device: Kobo Clara 2E, Tolino Shine 3
Quote:
Originally Posted by jgoguen View Post
I'm not sure if it's the temporary directory since it's doing other write operations on files before it fails. Unfortunately, this looks to be Windows-specific (and possibly even 32-bit Windows-specific); on macOS 10.12.2 with calibre 2.77.0 the book I got converts perfectly fine.
The strange is that conversion works on my other Windows 10, 32 bit Computer. I tried it on a third Computer, Windows 10, 64 bit, and there, it did also work without issues.

Quote:
I did see some older mailing list posts suggesting that lxml will break if you have anything other than Unicode or ASCII characters, and the file that's being referenced in the error you sent me has 0xfeff as the last (invisible) character on line 12. That character used to be a zero-width non-breaking space, but its use in that context is deprecated (use 0x2060 instead) and is currently standardised as the byte order marker which should only appear as the very first character in a file if it's present at all.

Since this is specific to Windows, there's not going to be much I can do. I can offer suggestions, and based on what you tell me I may be able to come up with something that works, but I can't promise anything. To start with, use calibre to edit the book (right-click, choose "Edit Book"), open the file "@public@vhost@g@gutenberg@html@files@14105@14 105-0-0.txt_split_000.html", and remove the last character on line 12; if you look at the bottom right you should see it say "ZERO WIDTH NO-BREAK SPACE : Line: 12 : 78". Backspace once, it should look like nothing changes but the bottom right should change. Save the book, try to convert, it'll probably fail again but it should mention a different file right before the UnicodeWarning line. If that's what happens, I may be able to work around it but it will be some additional pre-processing of each file. If that doesn't work, I guess it's back to the drawing board to figure out what to try next.
Well, I'll do my very best, but I doubt that this is the culprit, because the Problem occurs persistedly, with all other books in the library. I tested about 10 books to convert - always the same.
Leonatus is offline   Reply With Quote