MobileRead Forums - View Single Post

shamanNS · 10-22-2022, 03:54 PM

Used where / for what?

1) dc:language in .opf
2) xml:lang attribute on XHTML elements
3) stuff like Hunspell and OpenOffice / LibreOffice dictionaries

"dc:language" most of the time will have either 2 letter language code ("sr") or 3 letter ISO_whatever variant ("srp"). The same value is used for both Latin or Cyrillic script / alphabets epubs.
There aren't 2 or 3 letter codes that indicate the script used.

Stuff like Windows locale and Dot NET locale that support "extended language codes" use that form of "2 letter language code + 4 letter script + 2 letter country code" ( so "sr-Latn-RS" and "sr-Cyrl-RS")

No idea how Kobo's hyphenation dictionaries "encode" that type of info. I've noticed that for example KOReader has hyphenation rules only for Serbian Cyrillic and not for Serbian Latin.

10-22-2022, 03:54 PM	#500
shamanNS Wizard Posts: 1,128 Karma: 12345678 Join Date: Feb 2010 Location: Serbia Device: Kindle PW5, Kobo Libra 2, Kindle PW1	Used where / for what? 1) dc:language in .opf 2) xml:lang attribute on XHTML elements 3) stuff like Hunspell and OpenOffice / LibreOffice dictionaries "dc:language" most of the time will have either 2 letter language code ("sr") or 3 letter ISO_whatever variant ("srp"). The same value is used for both Latin or Cyrillic script / alphabets epubs. There aren't 2 or 3 letter codes that indicate the script used. Stuff like Windows locale and Dot NET locale that support "extended language codes" use that form of "2 letter language code + 4 letter script + 2 letter country code" ( so "sr-Latn-RS" and "sr-Cyrl-RS") No idea how Kobo's hyphenation dictionaries "encode" that type of info. I've noticed that for example KOReader has hyphenation rules only for Serbian Cyrillic and not for Serbian Latin.