View Single Post
Old 03-02-2018, 12:25 AM   #15
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by Nabeel View Post
A question came up in my Creative Writing group: there must be a computer programme or app that, when given a text, will analyse how frequently different words are used.
As others have stated, a user-friendly way for single words is to use Sigil or Calibre's Spellcheck lists. If you use Word, Toxaris's EPUB Tools also has this built in.

If you want to get "repeated phrases", that's called n-grams. Gregg Bell linked to one such tool, but there are plenty others.

Side Note: I personally use a commandline tool for n-grams. This gives me full control over the variables. Then I import it into a spreadsheet so I can sort by frequency.

Code:
This is an example of an n-gram example with an n-gram example.
2-grams would be all 2 words in a row:

Code:
1 This is
2 an n-gram
1 is an
1 an example
1 example of
1 of an
2 n-gram example
1 example with
1 with an
3-grams:

Code:
1 This is an
1 is an example
[...]
2 an n-gram example
[...]
Typically smaller n-grams are so full of cruft, they aren't really helpful ("he said" + "she said"). But I find the helpful patterns start to pop out at 4-grams and higher.

When you run this on a book-length text, you tend to see the author's own writing patterns.

I recently ran this on a ~70k word novel, and there were 26 "XYZ took a deep breath and" and 34 "XYZ shook her head". That's 292 words of characters taking a deep breath and shaking their heads.

Or a different author had the tendency to write "she said with an evil smirk on her face", "she said with a smile". So that author would probably want to go through and focus on chopping down "she said with".

A different book had 15 "What the f*** do you think you are doing?" That's 9 * 15 = 135 words.

These are typically a sign that you have to go through your book again and spice it up with variations.

Nobody wants to read hundreds of the same exact words again and again and again. Or slight variations of the words again and again... and again.

Quote:
Originally Posted by Nabeel View Post
Obviously, the really useful thing would be a programme that points out that you have unwittingly used a word like 'vast' five times in the same paragraph, but we're not looking for miracles.
The only tool I've come across that does this is TeXStudio:

https://www.texstudio.org/

It is a LaTeX editor, but you could use it for plain text if you wanted to.

It has a function called "Word Repetition":

https://tex.stackexchange.com/questi...from-texstudio

What it does is gives you a little green squiggly for the same word repeated within X number of words (you can set the min/max variables).

It tends to gives a lot of false positives though. One of the ways it could be made better would be if you could have some sort of whitelist, so you could ignore very common words ("the" + "if" + "and" + "but" + [...]).

Last edited by Tex2002ans; 03-02-2018 at 12:52 AM.
Tex2002ans is offline   Reply With Quote