View Single Post
Old 03-11-2023, 08:13 PM   #206
nezih
Enthusiast
nezih is less competitive than you.nezih is less competitive than you.nezih is less competitive than you.nezih is less competitive than you.nezih is less competitive than you.nezih is less competitive than you.nezih is less competitive than you.nezih is less competitive than you.nezih is less competitive than you.nezih is less competitive than you.nezih is less competitive than you.
 
nezih's Avatar
 
Posts: 43
Karma: 14828
Join Date: Feb 2023
Device: Boox Page, Kobo Aura SE
Quote:
Originally Posted by DenS View Post
Hi @nezih. I ran your script at the windows prompt and was able to convert a .html dictionary to .xml. Next I used pyglossary to convert the .xml to stardict(.ifo). It worked great, Thanks!
But there is a dictionary, actually what I needed most, which I can't convert to .xml. The command I use at the prompt is this:
Code:
mobi2stardict.py --html-file "book.html" --fix-links --dict-name "Grande Dicionário de Português" --author "Porto Editora" --textual --chunked
And the prompt gives me this error:
Code:
Traceback (most recent call last):
  File "D:\Downloads\mobi2stardict\mobi2stardict.py", line 160, in <module>
    convert(args.html_file, args.dict_name, args.author, args.fix_links, args.gls, args.textual, args.chunked)
  File "D:\Downloads\mobi2stardict\mobi2stardict.py", line 115, in convert
    key     = ET.SubElement(article, "key").text = entry.HW
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "src\lxml\etree.pyx", line 1042, in lxml.etree._Element.text.__set__
  File "src\lxml\apihelpers.pxi", line 748, in lxml.etree._setNodeText
  File "src\lxml\apihelpers.pxi", line 736, in lxml.etree._createTextNode
  File "src\lxml\apihelpers.pxi", line 1541, in lxml.etree._utf8
ValueError: All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters
It might be useful to say that to extract the .mobi dictionary to .html I used the KindleUnpack caliber plugin.
To install BeautifulSoup and lxml I used the commands "pip install beautifulsoup4" and "pip install lxml". The Python version I'm using is 3.11.2.
Could you help me figure out what I'm doing wrong?
Hi, If I remember correctly, I came across this problem recently. Most probably headwords include control characters. If you choose to convert to gls format(--gls) only, it will probably run fine. However, you would still need to substitute those with what they actually intended to show.
Open gls file via vscode, look for control chars. such as
Code:
BEL
,
Code:
ACK
etc. (You can use \p{C} in Find) Replace those with the intended characters. For example, in my problematic file, I replaced
Code:
BEL
with
Code:
ll
,
Code:
ACK
with
Code:
ch
.
nezih is offline   Reply With Quote