View Single Post
Old 06-24-2013, 03:05 AM   #29
Jarekczek
Junior Member
Jarekczek began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Jun 2013
Device: Pocketbook Pro 903
Quote:
Originally Posted by nrapallo View Post
I get this output when the program crashes:
Quote:
dict.xdxf, line 78723: unclosed xml tag
I've tried to verify what's left unclosed, but cannot locate anything using simple text searches and such. I even tried using xmllint.exe from the libxml2 2.7.6 windows port, but it's not been easy to find the culprit(s).
From experiments: xdxf is a correct xml document, but converter.exe fails to parse files with lines longer than 4096 bytes (not characters, bytes).

For example this xdxf file would throw the error:
<?xml version="1.0" encoding="UTF-8" ?>
<ar><k>word</k>
put_4090_simple_letters_here <fictional tag ends after the 4096 boundary>

So not all longer lines cause converter to fail, only unlucky ones. We need to break the lines to make them shorter. I use the following awk script which works for files containing <br><br>:

awk '{ gsub("<br><br>", "<br>\n<br>"); print; }' dict.xdxf >dict2.xdxf

I'll try to find an appropriate place to report this bug. Will follow rkomar's suggestion and have a look at the-ebook.org.

Last edited by Jarekczek; 06-25-2013 at 01:26 AM. Reason: added more context information - the error message
Jarekczek is offline   Reply With Quote