View Single Post
Old 10-24-2013, 10:15 PM   #9
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
I downloaded your program the day it was posted, and I just used it for the first time today on one EPUB.

Quote:
Originally Posted by drake7707 View Post
Not being a native English speaker myself I was hoping that hyphens were included in the dictionary.txt file and thus not flagged as 'Unnecessary hyphens'. This is one I have difficulty with when correcting books because I don't know the spelling of most of those hyphened words (and also seem to vary on a book by book basis).
SUGGESTIONS:

ONE

I wanted to point to something which might be helpful with hyphenations:

In English, there are many Prefixes: https://en.wikipedia.org/wiki/English_prefixes

Currently, your program marks all of these as "Unneeded hyphen". Perhaps hyphened words that start with these can be marked with a "Prefix" class instead.

TWO

Since you use frequencies, you should DEFINITELY mark (in a different color if possible) if both hyphneated and non-hyphenated versions of a word exist in a book at the same time:

"step-father" + "stepfather"
"mis-information" + "misinformation"
"business-man" + "businessman"
"life-like" + "lifelike"
[...]

As you stated, each book might hyphenate or not hyphenate these words, but it is almost always an error when they are mix and matched.

THREE

Throughout my EPUBs, there are a massive amount of page numbers (not to mention an Index). Your program marks down all of these hyphenated numbers and clutters the list:

"97-98" -> "p. 97-98"
"127-28" -> "pp. 121, 127-28, 185"

Also, numbers might be separated by an en dash instead of hyphen.

Perhaps these can be marked under the "Number" category as well.

BUGS:

"self-" is definitely missing from your hyphenations (your current program marked these as "Missing spaces"). So adding in those Prefixes should help fix many of these "Missing spaces" errors.

Your program said "thought1" was misspelled:

Actual Code:

Code:
  <p>Marshall in this regard makes his own thought<sup>1</sup> entirely clear:</p>
Code as it appears in your program:

Code:
  <p>Marshall in this regard makes his own thought1</sup> entirely clear:</p>
Your program said "p8" was misspelled:

Actual Code:

Code:
  <p>And further, in a note on the same pages: “Then p<sub>1</sub> p<sub>2</sub> . . . p<sub>8</sub> are points on his demand curve for tea; . . .” [...]
Code as it appears in your program:

Code:
  <p>And further, in a note on the same pages: “Then p<sub>1</sub> p<sub>2</sub> . . . p8</sub> are points on his demand curve for tea; . . .” [...]
Perhaps superscript and subscript errors could be treated slightly differently.

I will definitely be posting more errors as I find them.

Side Note: Should this be in the EPUB forum instead of "Reading and Management"?

Last edited by Tex2002ans; 10-24-2013 at 10:17 PM.
Tex2002ans is offline   Reply With Quote