View Single Post
Old 08-05-2015, 09:56 AM   #1
DaltonST
Deviser
DaltonST ought to be getting tired of karma fortunes by now.DaltonST ought to be getting tired of karma fortunes by now.DaltonST ought to be getting tired of karma fortunes by now.DaltonST ought to be getting tired of karma fortunes by now.DaltonST ought to be getting tired of karma fortunes by now.DaltonST ought to be getting tired of karma fortunes by now.DaltonST ought to be getting tired of karma fortunes by now.DaltonST ought to be getting tired of karma fortunes by now.DaltonST ought to be getting tired of karma fortunes by now.DaltonST ought to be getting tired of karma fortunes by now.DaltonST ought to be getting tired of karma fortunes by now.
 
DaltonST's Avatar
 
Posts: 2,265
Karma: 2090983
Join Date: Aug 2013
Location: Texas
Device: none
[GUI Plugin] English Noun Frequency

[GUI Plugin] English Noun Frequency


Summary: Determines 'English Noun Frequencies' for words in a particular book's text, and will optionally:

  • Add frequences for the chosen number of frequent nouns to the book's Comments;
  • Create new Tags using the chosen number of frequent nouns for Tags;
  • Update a Custom Column with the chosen number of frequent nouns for a Custom Column;
  • Update nothing but log the frequent nouns using the number chosen for Comments;
  • Translate the English Comments to another language, showing both;
  • Accumulate the Top 100 English Nouns with frequency counts across all of your books and all of you libraries.

Questions & Answers
:

Spoiler:


[1] What is the strategic purpose of “English Noun Frequency”?

Answer: To allow you to grok at a glance what a book is about in English and, optionally, a second language.


[2] Why is it needed? Books have authors, titles, tags and comments.

Answer: As a solution for plain-text ebooks with no title, author, tags or comments. Nothing necessary to download basic metadata from the web. Just plain text.

Additionally, regardless of the English metadata that may or may not be available, those readers for whom English is a second language have the option of viewing the results not only in English, but a second language. That ability would not otherwise exist. For native-English speakers who are students of another language, this option provides an opportunity to expand their vocabulary.


[3] How does it fulfill that goal?

Answer: It creates Comments and/or Tags and/or updates a Custom Column with the “Top N Most Frequent Nouns”. N is a number from 0-100 that you choose separately for Comments, Tags and a Custom Column for the books you select to be updated.


[4] What second languages are available for use?

Answer: Spanish is offered as a 'standard' second language choice. That is because it is the only language for which the developer desired to build and test a digital file with 5,000+ UTF8-encoded noun translation pairs (in this case, English noun to Spanish noun).

Any other language can easily be used by specifying a user custom 'English to Other Language' translation pairs file to use. A 'template' file is attached, as well as a copy of the 'standard' English to Spanish translation pairs.

For Spanish, since it is 'standard', any optionally specified user custom file may be used to supplement and/or override the 'standard' translation list. The user custom list takes precedence over the standard list.


[5] Why would someone with “real” books that have full metadata need or want to use this?

Answer: They do not “need” it, but they may “want” it, because it can create very interesting statistical information about your libraries. It not only achieves its strategic objective of allowing you to 'grok at a glance' what a single, particular book is about, but also allows you to have automatically accumulated into a single spreadsheet .csv file the frequency results of all of your books in all of your Calibre libraries. You then have a database of what nouns are the most common, not only relative to each other, but also absolutely. The exact word-count for each Top N noun used in every book in your library is summarized into a single number across all of your books for which you execute ENF.

For example, you might find that the word “neuroscience” occurs 468 times in your library. You might then wish to search the comments (to which ENF can prepend or append the Top N Nouns) of all of your books (even cross-library using the MultiColumnSearch plug-in) to find the word “neuroscience”.


[6] What formats of ebooks are supported?

Answer: TXT, EPUB, PDF. Other formats may easily be converted by Calibre to any of the supported formats. TXT is recommended. ENF uses only plain text lower case English letters from a to z. It converts all extracted text to lower case before analyzing.


[7] Will I find any forms of verbs in the Top N Nouns list?

Answer: No, although you might think some might be verbs because the words are taken out of context of their original grammatical use. They are a type of noun called a 'Deverbal Noun'. Examples: walk, speech, dent, scratch, building, fencing, piping, tubing, and painting.

If you activate translation of the Top N Nouns in the Comments to Spanish, the translated word will be the deverbal noun equivalent, not the verb. Example: 'look' will be translated as 'mirada', not 'mirar'.


[8] Where is the User Guide?

Answer: The “User Guide” is decentralized. The “tool tips” provide clear, detailed information. The labels of buttons and checkboxes also are clear. The Job Log has a great deal of information to help you understand your results and how they came about. Finally, the “Frequently Asked Questions” provide additional information about ENF as a whole. Taken together, these elements comprise the “User Guide”.




Requires Minimum Calibre Version: 6.0.0


Version History:
Spoiler:

Version 1.0.16- 2023-02-20 Qt.core.

Version 1.0.15- 2022-04-13 Qt6 Compatibility; Minimum Calibre Version now 6.0.0
Version 1.0.13- 2020-12-20 Miscellany
Version 1.0.12- 2020-07-21 Qt tweaks.
Version 1.0.11- 2019-12-28 Technical changes after Python 3.8 testing with Calibre 4.99.2; job execution speed greatly improved.
Version 1.0.10- 2019-04-30 Python 3 and Calibre 3.41.3+ pdf-to-html compatibility. Minimum Calibre version now 3.41.3.
Version 1.0.9 - 2018-10-11 Obfuscation of obscenities now optional.
Version 1.0.8 - 2018-03-20 OSX-only bugfix.
Version 1.0.7 - 2017-05-05 Allow themes with user-defined icons.
Version 1.0.6 - 2016-05-25 Technical tweaks.
Version 1.0.5 - 2015-10-31 Miscellaneous tweaks.
Version 1.0.4 - 2015-10-01 Miscellaneous tweaks.
Version 1.0.3 - 2015-08-31 Technical tweaks.
Version 1.0.2 - 2015-08-25 Miscellaneous tweaks.
Version 1.0.1 - 2015-08-08 The 'Options' Tab now has scroll-bars, and the GUI dialog now allows user resizing and relocation that will be 'remembered'.
Version 1.0.0 - 2015-08-05 Initial release.


Last edited by DaltonST; 02-20-2023 at 06:34 PM. Reason: Release 1.0.16
DaltonST is offline   Reply With Quote