Quote:
Originally Posted by BetterRed
So, why not use a dictionary for a real language one is never likely to really need? I just installed the Estonian oxt. No disrespect to Estonians - it was a rational decision, I chose it because its uses Latin characters but its not Indo-European.
|
Well, I wouldn't go marking text as a given language that it isn't, this is just going to cause more problems than you are trying to solve.
I was doing some back/forth help from Jellby with transcribing Greek letters, and he convinced me to start marking up Greek correctly. Here is how I handle it now:
Quote:
<p>[...] The division of labor turns the self-sufficient individual into the <span class="greek" xml:lang="grc">ζῷον πολιτικόν</span> dependent on his fellow men, the social animal of which Aristotle spoke. Hostilities between one animal and another, or between one savage and another, in no way alter the economic basis of their existence. [...]</p>
|
Also, this site gives some of the reasons why you would want to tag languages correctly:
http://www.unimelb.edu.au/accessibil.../language.html
Quote:
Language information specified via the lang attribute may be used by a user agent to control rendering in a variety of ways. Some situations where author-supplied language information may be helpful include:
- Assisting search engines
- Assisting speech synthesizers
- Helping a user agent select glyph variants for high quality typography
- Helping a user agent choose a set of quotation marks
- Helping a user agent make decisions about hyphenation, ligatures, and spacing
- Assisting spell checkers and grammar checkers
|
While there isn't A TON of benefit from marking it up now, a lot of the reasoning for marking up languages so in-depth is to future-proof the HTML.
"Assisting speech synthesizers" is extremely helpful with Text->Audio programs.
As to the typography side of things, now that I have stumbled into the world of LaTeX, having the languages marked properly allows the hyphenation dictionaries to work, which is (REALLY) important. And for languages with completely foreign character sets like Chinese or Greek, it allows you to easily swap in a different font.
There is also fantastic functionality built into LaTeX which easily allows you to swap between different rulesets (what quotation marks should be used, spacing rules around quotations, where linebreaks are allowed, etc. etc.).
Who knows, maybe ereaders in the future would be able to do more fancy stuff like that too with properly marked-up text.
Now, in a perfect world, you would mark every little saying as French, German, Spanish, etc. etc.... but that just takes way too long (the marginal benefit is not worth it to me), so I just settle on doing it for Greek.
Priority #1 is to get the dang books digitized and up online... way lower priority can be to go back and add in the language markup as needed. (Or when I get around to LaTeXing the books).