View Single Post
Old 05-18-2024, 08:27 AM   #15
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,733
Karma: 24031401
Join Date: Dec 2010
Device: Kindle PW2
Quote:
Originally Posted by mikapanja View Post
That works in case you know what you are looking for. But what if you don't, which was my original thought? E.g. if you want to know if any 5-word group is repeated in the text.
For those kind of searches you'll need to use a concordance tool. For example, Laurence Anthony's AntConc (freeware).
  • Unzip the epub file.
  • If the folder contains .xhtml files change their file extensions to .html.
  • Open AntConc, select Open file(s) as 'Quick Corpus', select .html as the file type and then select the extracted .html files.
  • Click the N-Gram tab, select the desired number of words and click Start.
I've attached a sample screenshot of the output.
Attached Thumbnails
Click image for larger version

Name:	ngrams.png
Views:	689
Size:	31.8 KB
ID:	208363  
Doitsu is offline   Reply With Quote