I downloaded your program the day it was posted, and I just used it for the first time today on one EPUB.
Quote:
Originally Posted by drake7707
Not being a native English speaker myself I was hoping that hyphens were included in the dictionary.txt file and thus not flagged as 'Unnecessary hyphens'. This is one I have difficulty with when correcting books because I don't know the spelling of most of those hyphened words (and also seem to vary on a book by book basis).
|
SUGGESTIONS:
ONE
I wanted to point to something which might be helpful with hyphenations:
In English, there are many Prefixes:
https://en.wikipedia.org/wiki/English_prefixes
Currently, your program marks all of these as "Unneeded hyphen". Perhaps hyphened words that start with these can be marked with a "Prefix" class instead.
TWO
Since you use frequencies, you should DEFINITELY mark (in a different color if possible) if both hyphneated and non-hyphenated versions of a word exist in a book at the same time:
"step-father" + "stepfather"
"mis-information" + "misinformation"
"business-man" + "businessman"
"life-like" + "lifelike"
[...]
As you stated, each book might hyphenate or not hyphenate these words, but it is almost always an error when they are mix and matched.
THREE
Throughout my EPUBs, there are a massive amount of page numbers (not to mention an Index). Your program marks down all of these hyphenated numbers and clutters the list:
"97-98" -> "p. 97-98"
"127-28" -> "pp. 121, 127-28, 185"
Also, numbers might be separated by an en dash instead of hyphen.
Perhaps these can be marked under the "Number" category as well.
BUGS:
"self-" is definitely missing from your hyphenations (your current program marked these as "Missing spaces"). So adding in those Prefixes should help fix many of these "Missing spaces" errors.
Your program said "thought1" was misspelled:
Actual Code:
Code:
<p>Marshall in this regard makes his own thought<sup>1</sup> entirely clear:</p>
Code as it appears in your program:
Code:
<p>Marshall in this regard makes his own thought1</sup> entirely clear:</p>
Your program said "p8" was misspelled:
Actual Code:
Code:
<p>And further, in a note on the same pages: “Then p<sub>1</sub> p<sub>2</sub> . . . p<sub>8</sub> are points on his demand curve for tea; . . .” [...]
Code as it appears in your program:
Code:
<p>And further, in a note on the same pages: “Then p<sub>1</sub> p<sub>2</sub> . . . p8</sub> are points on his demand curve for tea; . . .” [...]
Perhaps superscript and subscript errors could be treated slightly differently.
I will definitely be posting more errors as I find them.
Side Note: Should this be in the EPUB forum instead of "Reading and Management"?