View Single Post
Old 07-15-2020, 04:42 PM   #194
democrite
Evangelist
democrite will give the Devil his due.democrite will give the Devil his due.democrite will give the Devil his due.democrite will give the Devil his due.democrite will give the Devil his due.democrite will give the Devil his due.democrite will give the Devil his due.democrite will give the Devil his due.democrite will give the Devil his due.democrite will give the Devil his due.democrite will give the Devil his due.
 
Posts: 441
Karma: 77256
Join Date: Sep 2011
Device: none
Quote:
Originally Posted by CalibUser View Post
The code for this plugin has many different search/replace terms for correcting errors. It would require a really large number of checkboxes to implement your suggestion, and then the code would need to examine each checkbox to determine which corrections to implement.
What I mean is perhaps all the changes can be grouped into a minimum number of categories such as Common OCR errors, and so forth. That seems possible?

Quote:
Originally Posted by CalibUser View Post
What does Calibre do with the dictionary it has compiled? Does this dictionary consist only of hyphenated words?
As I haven't looked at the code I'm not sure exactly what calibre does. It certain compiles a word list to fix words as your plugin does that are line-break hyphenated in the PDF, e.g. "read- ing". Such is invaluable for certain works, such as the one I made recently of a scientific work containing countless latin terms and specialized vocabulary. Perhaps it too fixes hyphenated words such as "yellow- green". I would guess such could be a fair amount of work but simpler than what you suggest. I would guess maybe it'd be useful to also keep track of number of word occurrences in case of possible source typos, picking the more common one.
democrite is offline   Reply With Quote