View Single Post
Old 03-11-2023, 03:52 PM   #205
DenS
Connoisseur
DenS ought to be getting tired of karma fortunes by now.DenS ought to be getting tired of karma fortunes by now.DenS ought to be getting tired of karma fortunes by now.DenS ought to be getting tired of karma fortunes by now.DenS ought to be getting tired of karma fortunes by now.DenS ought to be getting tired of karma fortunes by now.DenS ought to be getting tired of karma fortunes by now.DenS ought to be getting tired of karma fortunes by now.DenS ought to be getting tired of karma fortunes by now.DenS ought to be getting tired of karma fortunes by now.DenS ought to be getting tired of karma fortunes by now.
 
Posts: 69
Karma: 2500000
Join Date: Apr 2021
Device: Kindle Basic 11th (2024), Paperwhite 12th
Quote:
Originally Posted by nezih View Post
If you want to try it yourself here is the link to my script's github repository: mobi2stardict
Hi @nezih. I ran your script at the windows prompt and was able to convert a .html dictionary to .xml. Next I used pyglossary to convert the .xml to stardict(.ifo). It worked great, Thanks!
But there is a dictionary, actually what I needed most, which I can't convert to .xml. The command I use at the prompt is this:
Code:
mobi2stardict.py --html-file "book.html" --fix-links --dict-name "Grande Dicionário de Português" --author "Porto Editora" --textual --chunked
And the prompt gives me this error:
Code:
Traceback (most recent call last):
  File "D:\Downloads\mobi2stardict\mobi2stardict.py", line 160, in <module>
    convert(args.html_file, args.dict_name, args.author, args.fix_links, args.gls, args.textual, args.chunked)
  File "D:\Downloads\mobi2stardict\mobi2stardict.py", line 115, in convert
    key     = ET.SubElement(article, "key").text = entry.HW
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "src\lxml\etree.pyx", line 1042, in lxml.etree._Element.text.__set__
  File "src\lxml\apihelpers.pxi", line 748, in lxml.etree._setNodeText
  File "src\lxml\apihelpers.pxi", line 736, in lxml.etree._createTextNode
  File "src\lxml\apihelpers.pxi", line 1541, in lxml.etree._utf8
ValueError: All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters
It might be useful to say that to extract the .mobi dictionary to .html I used the KindleUnpack caliber plugin.
To install BeautifulSoup and lxml I used the commands "pip install beautifulsoup4" and "pip install lxml". The Python version I'm using is 3.11.2.
Could you help me figure out what I'm doing wrong?
DenS is offline   Reply With Quote