View Single Post
Old 06-26-2019, 04:38 AM   #1
Ubiquity
Member
Ubiquity began at the beginning.
 
Posts: 17
Karma: 10
Join Date: Apr 2019
Device: Android phone
Metadata download plugin help with text encoding disorder

Hello, this is for me difficult to trace, I'm writing metadata download plugin and got stuck at extracting metadata from book details page.
Doing the testing with this book Serhii Plokhy - Chernobyl: The History of a Nuclear Catastrophe

Firstly, the author field seems correctly extracted to authors string and it prints to log as 'Serhii Plokhy', but when constructing a Metadata structure by
Code:
mi = Metadata(title, authors)
the print(mi) at end of fetching prints
Code:
Author(s)           : S & e & r & h & i & i &   & P & l & o & k & h & y
This is weird encoding and I even can't gues from what to what I should convert. The more weird it is that title, fetched exactly the same way is stored in metadata properly. I'm still assuming the web page is in UTF-8 and so are interpreted the Python strings internally.

Secondly, which may be related when parsing other book details like publisher, tags etc. from details table, the data are stored in table
Code:
<tr>
  <td>name</td>
  <td>value</td>
<tr>
I'm iterating through the table and feed mi values deciding the field by extracted name literals. This works when name is only lower ascii, details having name containing acutes or diacritics aren't matched by corresponding names in plugin code. This points out that name is in wrong code page, but again fetched name literals are printed to log in proper form.

Yet another difficulty with debugging, I'm not able to figure out where log.info(...), log.debug(...) and log.error(...) commands print. Calling calibre-debug -opens a textual log after closing Calibre, but the log doesn't contain any debug info printed by anu of these commands. What only works for me is using print(...) instead which appears in %temp%/calibre_XXXXXX/*.log files. I need a clue how to debug log properly.

Last edited by Ubiquity; 06-26-2019 at 04:47 AM.
Ubiquity is offline   Reply With Quote