![]() |
#1 |
meles meles
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 109
Karma: 163588
Join Date: May 2008
Location: Persepolis
Device: Pocketbook InkPad 3
|
What are the longest languages ?
Hi, I'm not sure where to ask this. But it is related to ebooks, because they make it easier to count characters, words and pages.
I'm wondering what langauges are usually the longest. You can attempt a distraction by saying "it depends on circumstances, translation", or "German numerals can theoreticaly be infinite". But I mean in practice. Even better, I have an idea how to measure this ! But I need your help. The idea is to compare the same book translated to different languages. The measuring stick should be character count. Not word count, because this can be misleading. I want the most verbose languages possible, where speaking usually takes the most time. Not page count, because it depends on other factors like margins, font size, and so on. I was thinking about comparing The Bible and/or War and Peace in different langauges. But I can't find a place where a large number of languages is listed. Ebooks are digital, which makes counting characters vastly easier. I think I prefer War and Peace, because The Bible tends to be written in archaic language to make it sound more profound. Or rather, it's often left that way. I think Russian may be one of longest, slavic languages tend to be quite long and Russian is noticeably longer than Polish. For instance, words with feminine gender often have 2 syllabes more. |
![]() |
![]() |
![]() |
#2 |
Member Retired
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,183
Karma: 11721895
Join Date: Nov 2010
Device: Nook STR (rooted) & Sony T2
|
I can tell you French is 20-30% longer than English.
|
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 615
Karma: 2362786
Join Date: Jan 2010
Device: PocketBook Verse Pro Colour
|
War and Peace is perhaps not such a good example. There are several 'original' versions of the book in Russian, so you would need to make sure the translator used the same version as the one you are comparing to. Also, having been published almost 150 years ago, War and Peace may not be the perfect example of 'modern' language.
But anyway, interesting idea, to compare the length of languages. I'll be interested to see the outcome. |
![]() |
![]() |
![]() |
#4 |
Nxfgrrjks
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 99
Karma: 925422
Join Date: Nov 2012
Location: New York, NY
Device: aura hd
|
Alphabets will use more characters than ideograms:
In English: I'm going home. In Japanese: 帰る。 It helps that i can elide the subject. |
![]() |
![]() |
![]() |
#5 |
Wizzard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 11,517
Karma: 33048258
Join Date: Mar 2010
Location: Roundworld
Device: Kindle 2 International, Sony PRS-T1, BlackBerry PlayBook, Acer Iconia
|
Well, if you're looking for a source of multiple languages comparisons, the old "I Can Eat Glass" project where someone solicited a bunch of translations (quality naturally varies) for the phrase "I can eat glass, it doesn't hurt me", seems like it may be of use.
The original seems to be down, but here are three sites where people have mirrored and/or added to the languages presented (including using the original language scripts instead of transliterations). Personally, I'd say that if written in the Roman alphabet instead of Canadian Aboriginal Syllabics, Inuktitut (which I used to be minorly interested in learning several years ago) seems like a pretty good candidate for length, given its agglutinative structure and the circumlocatory nature of some of the phrasing required due to its vocabulary limitations. Tamil and Burmese also look to have impressive character counts, based on the length of their sentences as written in script compared to the other samples. |
![]() |
![]() |
Advert | |
|
![]() |
#6 |
meles meles
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 109
Karma: 163588
Join Date: May 2008
Location: Persepolis
Device: Pocketbook InkPad 3
|
I'm asking for a long book like War and Peace (does anyone have a better idea ?) to have a sample of significant size. "I can eat glass..." is such a tiny sample.
I forgot about ideograms and Japanese. To level the playing field, we would have to compare with Japanese written using roman alphabet. http://en.wikipedia.org/wiki/Romanization_of_Japanese Romanized Japanese is not perfect, but I'm not looking for a perfect solution. |
![]() |
![]() |
![]() |
#7 |
Wizzard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 11,517
Karma: 33048258
Join Date: Mar 2010
Location: Roundworld
Device: Kindle 2 International, Sony PRS-T1, BlackBerry PlayBook, Acer Iconia
|
The difficulty with trying for a long sample is that it would be very hard to track down the necessary translations for it (assuming new enough to still be in print or old enough to be in public domain), especiallly in e-book format, as well as archaic language reasons (especially with public domain translations, which are often stilted in character due to generally being at least a century old).
I'd suggest that rather than War and Peace, perhaps try looking for samples of a popular, modern international bestseller, such as one of those inevitable thriller novels like The Da Vinci Code, which is bound to have a large amount of modern language translations, some of which you may be lucky enough to find first-chapter samples of online at various vendors and/or publisher websites. Failing that, I know the Asterix and Tintin comic book adventure albums have all been translated into upwards of 30 languages each and seem to mostly be in print (though may be difficult to track down). ETA: Your best bet for an e-book available modern language text with lots of translations may be Paulo Coelho's The Alchemist, which Wikipedia says has 67 translations, some of which Coelho himself released to the internet at large and encouraged people to download, compared to TDVC's 40 or so, and Lord of the Rings' 38. Last edited by ATDrake; 04-15-2013 at 12:41 AM. |
![]() |
![]() |
![]() |
#8 | |
Nxfgrrjks
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 99
Karma: 925422
Join Date: Nov 2012
Location: New York, NY
Device: aura hd
|
Quote:
There's Chinese, too. It's probably at least as efficient as Japanese if written in characters rather than pinyin. |
|
![]() |
![]() |
![]() |
#9 |
Evangelist
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 485
Karma: 270594
Join Date: Aug 2010
Device: palm tx, Windows7, Galaxy A5
|
Comparing things only in Roman alphabet can add several letters to the name. For example:
Name 'Zhenya' has only 4 characters in Russian |
![]() |
![]() |
![]() |
#10 | |
meles meles
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 109
Karma: 163588
Join Date: May 2008
Location: Persepolis
Device: Pocketbook InkPad 3
|
Quote:
So okay, let's exclude Japanese and Chinese from the statistics. The point is not to find flaws in this method - it has plenty - but to get a semi-reliable result with a quite big sample. If we can achieve that, we can talk about improvements. |
|
![]() |
![]() |
![]() |
#11 |
temp. out of service
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,813
Karma: 24285242
Join Date: May 2010
Location: Duisburg (DE)
Device: PB 623
|
Even then:
Comparing word lengths only makes sense between languages of similar grammar, because differences in grammatical structure more often than not force you to add words to a sentence to keep the information inside. This esp. when one of the languages is flexing (like PL) and the other not. Example would be following piece of dialgue: Two people, one male the other female, are asked if they were out to do something specific (e.g. visit a museum). While in PL the answers "byłam" (I was) and "nie, odwiedzałem kumpla" (no I was visiting a pal) are sufficient to indicate who said what - because of gender suffixes to the verbs; you have to write at least "I was - said $_female" to transmit the same amount of information. Now keep the same in mind for adjectives, adverbs and differences between grammatical treatment of times... While some languages do a lot of it by pre- and suffixes others require heaps of additional words for it. German allows to save place with nouns - you can stick multiple to each other. so while you have to say "Office of Foobaric Affairs" in Polish, you simply have a german "Foobaroffice". Sounds like it would make things a lot shorter. Nevertheless, because Polish is flexing it allows for using less words in other cases. In fact it's a size difference of about 1/3rd shorter texts in Polish. Then you have to keep in mind that not every word has a corresponding equivalent in every language. There is no word for "toe" in PL (finger of foot is used) There is no singular "parent" in DE (parentspart is used) My points are as follows:
Last edited by Freeshadow; 04-17-2013 at 10:44 AM. |
![]() |
![]() |
![]() |
#12 | |
Nxfgrrjks
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 99
Karma: 925422
Join Date: Nov 2012
Location: New York, NY
Device: aura hd
|
Quote:
It occurs to me that the relative simplicity of English grammar forces writers to use dialogue tags where writers of other languages might get away with grammatical hints. All those "he saids" & "she saids" pad the word count. The original text of the Tale of Genji has no names because refering to a person by name was considered rude in the Heian court. Honorifics, humble forms and the various inflections of politeness hint at who is speaking about whom, but there are also explicit references to positions, etc. that help identify characters. In other words, there are grammatical efficiencies that are undone by social rules. Translators give the characters names. Nevertheless, it remains a pretty thick book when translated. |
|
![]() |
![]() |
![]() |
#13 |
meles meles
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 109
Karma: 163588
Join Date: May 2008
Location: Persepolis
Device: Pocketbook InkPad 3
|
I understand, but I would still like to compare languages in this simple way. Like any statistic, it would have to be taken with a grain of salt, but could be illuminating. There's no way to make a perfect comparison of languages, and an imperfect one is the next best thing.
Another way to compare could be using a speech synthesizer. If the programs are mature, it would take care of spelling issues. For example French has quite ancient spelling. The French equivalent of "many" - "beaucoup" looks much longer but is actually quick to pronounce. It's roughly "bocoo", "c" like in "corn". If you keep finding flaws in the idea, it's not constructive. You can go and try to get a job in QA or Software Testing, but probably not in something creative. Your reservations will hold you back. Last edited by b0rsuk; 04-18-2013 at 11:53 AM. |
![]() |
![]() |
![]() |
#14 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,037
Karma: 18765433
Join Date: Oct 2010
Location: Sudbury, ON, Canada
Device: PRS-505, PB 902, PRS-T1, PB 623, PB 840, PB 633
|
The nice thing about a short text is that you can count characters by eye. If you go for an entire book, then you have to figure out how to count the text characters in it. It takes work to extract only the text from most ebook formats, and non-English languages will need more than one byte per character, so you can't just use the file size to roughly count characters. It is interesting because there are so many other problems to solve before you can even get down to determining relative length.
|
![]() |
![]() |
![]() |
#15 |
Nxfgrrjks
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 99
Karma: 925422
Join Date: Nov 2012
Location: New York, NY
Device: aura hd
|
This is an interesting topic, and for that reason, i would be very surprised if someone hasn't already done exactly what you're proposing. After all, the relative wordiness of languages is a practical problem for translators, and one would think linguists would also find it interesting. I might try a literature search before reinventing the wheel.
But if i had to do this tonight, i'd use the "Communist Manifesto" mainly because it's in the public domain, has been translated with relatively modern language, and has been rendered into many many languages. Even more importantly, the marxists.org site has a handy page with links to many -- but not all -- the translations from a singe web page. (It's easier to work from a single page than to poke around on sites in languages i don't understand.) Finally, i'd pipe the output of lynx -dump into wc -m. wc counts new line & EOF characters, but wothehell, quick-n-dirty would fit the parameters you've laid out. When all is said & done, it's your project, so go ahead and do it your way. I'd be curious to know what you find, even if you exclude the languages i'm most interested in. |
![]() |
![]() |
![]() |
Tags |
comparison, curiosity, languages, research, statistics |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
The longest configure list ever? | twobob | Kindle Developer's Corner | 22 | 08-29-2012 12:10 PM |
Longest ebook you have read and enjoyed | tech_au | Reading Recommendations | 71 | 07-16-2011 01:58 PM |
What have you had the longest? | Stitchawl | Lounge | 69 | 02-26-2011 02:18 AM |
Your longest reading marathon? | ardeegee | Lounge | 13 | 10-02-2010 01:44 PM |
What files take longest to load/index? | ProDigit | Sony Reader | 4 | 10-24-2008 02:52 PM |