View Single Post
Old 02-04-2020, 03:20 PM   #279
EastEriq
Groupie
EastEriq can program the VCR without an owner's manual.EastEriq can program the VCR without an owner's manual.EastEriq can program the VCR without an owner's manual.EastEriq can program the VCR without an owner's manual.EastEriq can program the VCR without an owner's manual.EastEriq can program the VCR without an owner's manual.EastEriq can program the VCR without an owner's manual.EastEriq can program the VCR without an owner's manual.EastEriq can program the VCR without an owner's manual.EastEriq can program the VCR without an owner's manual.EastEriq can program the VCR without an owner's manual.
 
Posts: 199
Karma: 195502
Join Date: Jan 2018
Device: Cybook Orizon, PocketBook Touch HD
Quote:
Originally Posted by Markismus View Post
If something is off about the dictionary, would you please let me know?
Only since you're saying. I've looked at it, and there are still so many tags and nonstandard entities which make it poorly readable. They should be all workable out with some scripting. I started writing some sed substitutions for the xdxf, but at some point I gave up. Among them:
  • html entities like   ' &# 171; &# 187; to be replaced with their unicode equivalent;
  • coded characters. The ones I identified were greek letters, e.g. ε for ε. To be replaced by the unicode.
  • phonetic/splitting patterns, like [<f>&a;&b;&è;&s;(&e;)&m;&an;</f>] to be replaced with something proper e.g. [a-b-è-s-(e)-m-an] (be careful because there are tricky cases w.r.o doubles, silent vowels, pronunciation equivalents, etc.)
  • non entity splitters, like <f>&os;</f>, <f>&ns;</f>, <f>&oo;</f>. I think they separate different meanings of a lemma or explanation from example. To be replaced by a proper graphism.
EastEriq is offline   Reply With Quote