MobileRead Forums - View Single Post

Markismus · 02-05-2020, 03:09 AM

Yes, I see it. Garbage in = garbage out, I am afraid.

This is the entry in the chunk c_1 in the archive dict-file:

Code:

<b>A,</b> N.&nbsp;m. [<f>&a;</f>] ou [<f>&â;</f>] Voyelle et première lettre de l'alphabet. <i>Une panse d' <i>a </i>, </i>la première partie d'un petit <i>a </i> dans l'écriture. <f>&os;</f>&nbsp;<i>N'avoir pas fait une panse d' <i>a </i>, </i>c'est-à-dire n'avoir rien écrit. <f>&ns;</f>&nbsp;<i>Prouver par A + B, </i>avec précision et rigueur. <f>&ns;</f>&nbsp;<i>De A à Z, </i>du début à la fin. <f>&ns;</f>&nbsp;<i>A4, </i>format d'une feuille de papier de 21&nbsp; X &nbsp;29,7&nbsp;cm. <i>A3, </i>format 29,7&nbsp; X &nbsp;42&nbsp;cm. <f>&ns;</f>&nbsp;La.

And this is the entry in the Stardict file generated by Penelope:

As you can see, it is not the accented characters that are the problem, rather in the idx-file there is a doctype definition given, which defines all those entities:

Code:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"[
<!ENTITY ns "♦">
<!ENTITY os "•">
<!ENTITY oo "›">
<!ENTITY co "‹">
<!ENTITY a  "a">
<!ENTITY â  "&#x0251;">
<!ENTITY an "&#x0251;&#x303;">
<!ENTITY b  "b">
<!ENTITY d  "&#x0257;">
<!ENTITY e  "&#x0259;">
<!ENTITY é  "e">
<!ENTITY è  "&#x025B;">
<!ENTITY in "&#x025B;&#x303;">
<!ENTITY f  "f">
<!ENTITY g  "&#x0261;">
<!ENTITY h  "h">
<!ENTITY h2 "&#x0027;">
<!ENTITY i  "i">
<!ENTITY j  "J">
<!ENTITY k  "k">
<!ENTITY l  "l">
<!ENTITY m  "m">
<!ENTITY n  "n">
<!ENTITY gn "&#x0272;">
<!ENTITY ing "&#x0273;">
<!ENTITY o  "o">
<!ENTITY o2 "&#x0254;">
<!ENTITY oe "&#x0276;">
<!ENTITY on "&#x0254;&#x303;">
<!ENTITY eu "&#x0278;">
<!ENTITY un "&#x0276;&#x303;">
<!ENTITY p  "p">
<!ENTITY r  "&#x0280;">
<!ENTITY s  "s">
<!ENTITY ch "&#x0283;">
<!ENTITY t  "t">
<!ENTITY u  "&#x0265;">
<!ENTITY ou "u">
<!ENTITY v  "v">
<!ENTITY w  "w">
<!ENTITY x  "x">
<!ENTITY y  "y">
<!ENTITY z  "z">
<!ENTITY Z  "&#x0292;">]>
<html xml:lang="fr"
	xmlns="http://www.w3.org/1999/xhtml">
	<head>
		<title></title>
	</head>
	<body>

However, the &nbsp sequence is normal html for non-breakable-space. If you change the sametypesequence in the ifo file from m to h they don't disappear from linguae, but Goldendict does switch (as does Koreader):

sametypesequence=m

sametypesequence=h