View Single Post
Old 09-13-2009, 08:28 PM   #133
msundman
Zealot
msundman has a complete set of Star Wars action figures.msundman has a complete set of Star Wars action figures.msundman has a complete set of Star Wars action figures.
 
Posts: 103
Karma: 269
Join Date: Aug 2006
Device: FBReader on Android
Quote:
Originally Posted by Teyrnon View Post
Mmhmm, as am artifact of my background I tend to think of digits, letters, punctuation marks, every glyph pretty much, all as characters. You obviously don't. Which is fine, For what we're talking about you're certainly more correct.
I also think about all glyphs as characters (I'm biased towards unicode, so I'm even into "supplementary characters"), but when counting book lengths I would count only "word characters", because other characters aren't "read" in the same sense, but are there for different kinds of structuring and markup. Heck, one could even think of text formatting as characters (e.g., the "horizontal tab" and the "form feed" characters).

I was only trying to say that I think "word character" counts are particularly suitable for measuring text size.

Quote:
Originally Posted by Teyrnon View Post
Smaller numbers are easier to deal with. And frankly I'd rather not have to use SI units when talking about literature.
What on earth do you have against SI units? Not only are they extremely handy, the actual composed units easily become units of their own. E.g., people hardly ever think of "cm" as "m/100" or "km" as "m*1,000", but instead people see "km", "m" and "cm" as units themselves. Yet switching between these units remain incredibly easy.

That said, "1.5" in "1.5 Mchars" is a smaller number than "150" in "150 kwords".

Quote:
Originally Posted by Teyrnon View Post
Quote:
Originally Posted by msundman View Post
Why would you want to convert from character count to word count?!? (E.g., if it's just about what you are currently used to then I see no real reason, because you could just as well get used to character counts instead.)
I know I read 200WPM.
What you're currently used to should be irrelevant, as I said in what you quoted.

Quote:
Originally Posted by Teyrnon View Post
I know word recognition is largely glyphic. As in words themselves are glyphs. When one is reading one is reading in whole word chunks for the most part. Individual letters only become significant if one doesn't immediately recognize the word. This would seem to make the word the most significant unit in this transaction.
If that was the only relevant variable then I would agree with you. However, since words of different length are read with different speed (even though they are read a word at a time) and since words of different length take up more space, it seems that there might be some other way to more accurately describe the speed and size of some text. AFAIK character count is such a way.

Quote:
Originally Posted by Teyrnon View Post
That's the thing, I don't want a rough guess.
Why? Your "200 words/minute" is very rough in itself. You probably read a lot more words in a minute if there's lots of small words, and a lot fewer words in a minute if there's lots of large words.

Quote:
Originally Posted by Teyrnon View Post
I want a word count because I read words not letters. The letters make up the words yes, but when read that word "yes" I see yes not Y-E-S. It's an instant recognition as a single glyph. The letters are there but they have no meaning outside of the pattern that constitute the word.
This is a largely philosophical point of view, and even as such it's highly debatable. E.g., one could count sentences instead of words, and claim that words have no meaning outside of the pattern that constitute the sentence. (Words do have a meaning outside sentences, just as characters have a meaning outside words, although outside their context they carry less information.)

Quote:
Originally Posted by Teyrnon View Post
Let's try an example from chemistry. Word count seems to me like insisting that rather than giving atomic numbers as numbers of protons that make up each atom to identify the individual elements. instead give the number of quarks of atoms. Sure atoms are ultimately constituted of quarks but quarks have little relevance to chemistry. It also may confuse matters since perhaps the same numbers of quarks might represent vastly different atoms and arrangements.
The comparison is invalid. It's more like word count is like atom count whereas character count is akin to counting protons (and possibly neutrons). The former might be more important in some regards, but when we're trying to figure out the total mass then the latter wins hands down.

Also, you seem to think I'm proposing to use character counts for no reason at all, when in reality I've even outlined the reasons.

Quote:
Originally Posted by Teyrnon View Post
Quote:
Originally Posted by msundman View Post
As I've already said, character count reflects quite accurately the length of the text. Much, much better than word count (or page count). It also reflects well the time it takes to read the text, and works in many different languages.
Okay, have you tested this? How many characters per minute to you read.
I have tested it, but only on a very small scale. Also, I'm not equally fluent in all languages I know, so inter-language tests are a bit unreliable. The subject also plays a large role here. E.g., I read fiction novels much faster than scientific papers. All these variables are the same for both character counts and word counts, so neither is better or worse because of these variations in the text.

Since I didn't have any results of old tests I just now made a few new tests. Here are the results:
Code:
#   time   words chars   wpm   cpm   avg.w.len
1 00:02:09   607  2581   282  1200   4.25
2 00:00:36   150   776   250  1293   5.17
3 00:07:02  2166  9548   308  1358   4.41
4 00:00:30   134   651   268  1302   4.86
5 00:02:51   503  3893   176  1366   7.74
6 00:01:16   405  1876   320  1481   4.63
All were different texts by different authors. Texts #1-4 were in English, #5 in Finnish and #6 in Swedish.

Now, the smaller the following differences to the average are, the better the measure is.

English only:
Word counts: 2%, 10%, 11%, 3%
Char counts: 7%, 0%, 5%, 1%

All languages:
Word counts: 6%, 7%, 15%, 0%, 34%, 20%
Char counts: 10%, 3%, 2%, 2%, 2%, 11%

So, the average of the averages are:

Word counts:
  • English only: 7%
  • All languages: 14%
Character counts:
  • English only: 3%
  • All languages: 5%

So, character counts are, at least in this case, 2-3 times as accurate as word counts.

Quote:
Originally Posted by Teyrnon View Post
Your certain the figure doesn't change depending on how those letters are arranged into words? Do you really break everything down into individual letters as you read?
My what?
Of course the meaning behind the letters and the words change the speed with which you read.
I've never even hinted that I would break down words into individual letters as I read, and in fact I've said that words are usually read whole. (It's actually more complicated than that. Pupil tracking shows quite complex patterns. This is all beside the point, though.)

Quote:
Originally Posted by Teyrnon View Post
I'm having trouble imagining a situation where I wouldn't find character count obtuse and cumbersome when deciding how fast a document will be read.
Why?!? Once more, if it's only because you're currently not used to character counts as much as you're used to word counts then I don't see how that would be a real reason since you would adjust very quickly.

Quote:
Originally Posted by Teyrnon View Post
However on average I'd find word count more useful so I argue for that.
You have yet to provide any valid reason for that. The only reason I've deciphered from your replies is the philosophical "words are read whole". I, OTOH, have described how character counts more accurately reflect both the size of the text and the speed at which it is read. IMO my arguments are better because of their pragmatic nature, and thus better suitable for this pragmatic problem.

Quote:
Originally Posted by Teyrnon View Post
Quote:
Originally Posted by msundman View Post
E.g. it's definitely not sensible to count the Finnish word "epäjärjestelmällistyttämättömyydelläänsäkäänk öhän " as 1 unit, as it's composed of over 10 suffix units altering the meaning of its base.
Is this word an example of what one would see in everyday literature? Are Finnish books filled with words of that length? How common are they?
No, that word is extreme. However, many words in English are turned into suffixes in Finnish. E.g., "car" is "auto", but "my car" is "autoni" and "your car" is "autosi".
As for average word lengths, as you can see from my tests outlined above English and Swedish have an average word length of 4-5 characters whereas that number in Finnish is closer to 8.

Quote:
Originally Posted by Teyrnon View Post
Quote:
Originally Posted by msundman View Post
I doubt that's correct. Although it's true that words are usually read whole, it's also true that longer words usually take longer to read than shorter words. AFAIK people almost completely "jump over" (i.e., the eye movement doesn't slow down significantly at) very short words, such as "a", when they read.

If speed readers would read every word equally fast then Finnish speed readers would finish books in considerably less time, but AFAIK this is not the case. AFAIK the number of characters more accurately reflects both the length of the text and the speed with which it's read.
That's the thing, I know what I said holds true for English. I can't speak for Finnish but you seem to be suggesting that 50 character words are common. Yeah, I can see where such words might be difficult to process as a single glyph.
I flat out don't believe for a second that your claim that words of different lengths are read at the same speed is correct. Not in English and not in any other language. I've seen pupil tracking of different people reading, and although they certainly don't read individual characters they do tend to spend more time on longer words than on shorter, and almost completely jump over very short words such as "a".

I haven't suggested that 50 character words are common in Finnish. They are in fact not. However, words tend to be significantly longer, on average, in Finnish than in English.

I haven't been arguing against processing words as single entities, so would you stop arguing against that straw-man, please?

Quote:
Originally Posted by Teyrnon View Post
Quote:
Originally Posted by msundman View Post
Certainly characters are meaningful units of character-based languages, so I don't know what you're getting at.
What I'm getting at is that by and large it's words not letters that are sensible in reading a text
Both are very much sensible, as are sentences as well as more abstract word structures. My point is that the character count reflects both the size of a text and the speed with which it's read more accurately than the word count. Your arguments seem to be of a more philosophical nature, and while I'm a big fan of letting The Right Way(tm) triumph over hyperpragmatism I don't see a very big difference between characters and words in the sense you're trying to convey, while I see a big difference in accuracy in favor of character counts.

Last edited by msundman; 09-13-2009 at 08:48 PM. Reason: added some missing numbers
msundman is offline   Reply With Quote