View Single Post
Old 07-03-2020, 04:55 AM   #20
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by mcdummy View Post
I'm using a PRS-T3, which does not apply hyphenation to all languages.
Thanks for the info. I'm very interested in multi-language hyphenation.

Even many browsers don't handle hyphenation properly yet, which is why I was interested if you found a reader that could do it at that level.

Quote:
Originally Posted by mcdummy View Post
My PRST-T3 seems to work at least on a html-file-level, i.e. it can change the language when a new html-file is processed.
Probably a good assumption.

Quote:
Originally Posted by mcdummy View Post
So far, I haven't figured out, which language instructions it processes and ignores (e.g., xml:lang="..." vs. lang="..." or en-US vs. en_US).
Using _ is invalid. Only - allowed.

See "Tags for Identifying Languages" (BCP47) and w3c's page on "Language tags in HTML and XML".

Also, in XHTML xml:lang takes priority:

Quote:
The xml:lang attribute is not actually useful for handling the file as HTML, but takes over from the lang attribute any time you process or serve the document as XML. The lang attribute is allowed by the syntax of XHTML, and may also be recognized by browsers. When using other XML parsers, however (such as the lang() function in XSLT) you can't rely on the lang attribute being recognized.
Quote:
Originally Posted by mcdummy View Post
For instance, the PRS-T3 seems to ignore en_US/en_GB/de_DE/fr_..., while en-US/en-GB/de-DE/fr-... seems to work.
Also, best to stick with minimal possible. Better to more broadly specify (en) than over-specify wrongly (en-US on a en-GB document) or redundantly.

See w3c's "Choosing a Language Tag":

Quote:
Always bear in mind that the golden rule is to keep your language tag as short as possible. Only add further subtags to your language tag if they are needed to distinguish the language from something else in the context where your content is used.
* * *

Also, if you desperately need to handle multiple dictionaries in a single document, and you use Microsoft Word... you could import your properly-lang-marked EPUB -> DOCX using Toxaris's EPUB Tools:

https://www.mobileread.com/forums/sh....php?p=2516490

I was pleasantly surprised to see it transferred over all lang information into DOCX, which made dealing with the red squigglies so much easier!

(I recently used it to mark all Spanish/French/German text, and even American/British, making the spellchecking passes so much faster.)
Tex2002ans is offline   Reply With Quote