View Single Post
Old 04-16-2017, 01:48 AM   #25
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by DiapDealer View Post
What I'm not hearing, is why multiple dc language entries are being added to the opf in the first place. I see no advantage in doing so. In fact doing so can only cause problems--as this thread proves.

So where does the ill-advised practice of multiple opf dc language entries originate from? Anybody?
I would say it is/was just years of poor practices.

Maybe it was borne from poor usage of the HTML meta element for years:

https://www.w3.org/International/que...-http-and-lang

Quote:
Specifying language with the meta element (not recommended)

The use of a meta element in the document head with the http-equiv attribute set to Content-Language is not mentioned directly in the HTML 4.01 specification, and yet, for a long time, much of the informal guidance out on the Web about how to declare language for your HTML page suggested its use, and some HTML authoring tools automatically created such elements when you specified language information using dialog boxes. Here is an example that declares the language to be English.

***Do not use this*** <meta http-equiv="Content-Language" content="en">

Unlike the lang and xml:lang attributes, the value of the content attribute can be a comma-separated list of language tags. The example below declares the primary languages of the document to be (in equal measure) German, French and Italian.

***Do not use this*** <meta http-equiv="Content-Language" content="de, fr, it">

If the name of the meta element wasn't a clear enough clue, the fact that the value supports multiple languages indicates that this element is really about document level metadata. If you are to usefully indicate the language of a range of text, you have to be specific – it can only be in one language at a time. The meta element, then, is an in-document location for expressing metadata about the language of the intended audience of the document as a whole.

[...]

Because of the history of confusion and inconsistent implementation surrounding this kind of declaration, in 2011 the HTML Working Group took a decision to make the meta element with http-equiv set to Content-Language non-conforming in HTML. This means that you should no longer use it in HTML5, and therefore, though technically not illegal in other types of HTML, it is best to now not use it anywhere.
Perhaps InDesign already had their non-standard HTML meta output... and then Adobe just decided to convert that HTML meta directly to dc:language in EPUBs?

Speaking of multiple dc:language in EPUBs, this mention of Dublin Core was slightly further down the page:

Quote:
[...]

Dublin Core on the meta element. Since the rules in HTML4 for meta elements put few restrictions on how it is used, it is also possible, though not common, to find instances where it is used to express language information using Dublin Core notation. It does not appear, however, that this information is ever used by browsers, and it is unclear to what extent it is used by any other application.
Maybe it was just have been a simple carryover from the OEBPS days (1999-2007).

Or maybe when the EPUB standard was being put together they just decided, "Sure, a lot of the cool hip kids are still doing it in HTML nowadays."

Quote:
Originally Posted by BetterRed View Post
Tagging passages/fragments in foreign languages is a PITA, especially if more than one f-l. So maybe -- if a dictionary lookup on a word using the first language doesn't yield a result, try the second language... or on a dictionary lookup, show the results for each language... or do nothing
In the case of a book, I would say you could MAYBE stick two languages in your content.opf if you had a Left/Right English/German translation (although know this could be buggy as ChipSuey found out).

But besides that specific use-case, only have one overarching main language + mark the foreign text directly with lang + xml:lang. This would be the least buggy option in all the readers, and be much closer to its actual intended use on the text processing side of things (search/dictionary/text-to-speech/[...]).

This is the relevant text from the w3 link linked above:

Quote:
Specifying file metadata: the language of the intended audience

Metadata that describes the language or languages of the intended audience is about the document as a whole. Such metadata may be used for searching, serving the right language version, workflow management, classification, etc. Where there are language changes in a document, information about the language of the intended audience is not specific enough to support text-processing (for example in a way that would be needed for the application of text-to-speech, styling, automatic font assignment, etc.)

The language of the intended audience does not include every language used in a document. Many documents on the Web contain embedded fragments of content in different languages, whereas the page is clearly aimed at speakers of one particular language. For example, a German city-guide for Beijing may contain useful phrases in Chinese, but it is aimed at a German-speaking audience, not a Chinese one.

On the other hand, it is also possible for a page to contain the same or parallel content in more than one language. For example, a Canadian web page may welcome readers with French content in the left column, and the same content in English in the right-hand column. Here the document is equally targeted at speakers of both languages, so there are two audience languages. This situation is not as common on the Web as in printed material since it is easy to link to separate pages on the Web for different audiences, but it does occur where there are multilingual communities. Another use case is a blog or a news page aimed at a multilingual community, where some articles on a page are in one language and some in another. For example, a forum used by a Punjabi community may contain posts in English, Hindi and Punjabi in a single thread.

There are also pages where the navigational information, including the page title, is in one language but the real content of the page is in another. While this is not necessarily good practice, it doesn't change the fact that the language of the intended audience is usually that of the content, regardless of the language at the top of the document source.
Having four or five dc:languages in a book though? That just sounds absolutely out of intended scope of "the language of the document as a whole".

Last edited by Tex2002ans; 04-16-2017 at 01:51 AM.
Tex2002ans is offline   Reply With Quote