MobileRead Forums - View Single Post

DenS · 03-11-2023, 03:52 PM

Quote:

Originally Posted by nezih

If you want to try it yourself here is the link to my script's github repository: mobi2stardict

Hi @nezih. I ran your script at the windows prompt and was able to convert a .html dictionary to .xml. Next I used pyglossary to convert the .xml to stardict(.ifo). It worked great, Thanks!
But there is a dictionary, actually what I needed most, which I can't convert to .xml. The command I use at the prompt is this:

Code:

mobi2stardict.py --html-file "book.html" --fix-links --dict-name "Grande Dicionário de Português" --author "Porto Editora" --textual --chunked

And the prompt gives me this error:

Code:

Traceback (most recent call last):
  File "D:\Downloads\mobi2stardict\mobi2stardict.py", line 160, in <module>
    convert(args.html_file, args.dict_name, args.author, args.fix_links, args.gls, args.textual, args.chunked)
  File "D:\Downloads\mobi2stardict\mobi2stardict.py", line 115, in convert
    key     = ET.SubElement(article, "key").text = entry.HW
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "src\lxml\etree.pyx", line 1042, in lxml.etree._Element.text.__set__
  File "src\lxml\apihelpers.pxi", line 748, in lxml.etree._setNodeText
  File "src\lxml\apihelpers.pxi", line 736, in lxml.etree._createTextNode
  File "src\lxml\apihelpers.pxi", line 1541, in lxml.etree._utf8
ValueError: All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters

It might be useful to say that to extract the .mobi dictionary to .html I used the KindleUnpack caliber plugin.
To install BeautifulSoup and lxml I used the commands "pip install beautifulsoup4" and "pip install lxml". The Python version I'm using is 3.11.2.
Could you help me figure out what I'm doing wrong?