Quote:
Originally Posted by CalibUser
The code for this plugin has many different search/replace terms for correcting errors. It would require a really large number of checkboxes to implement your suggestion, and then the code would need to examine each checkbox to determine which corrections to implement.
|
What I mean is perhaps all the changes can be grouped into a minimum number of categories such as Common OCR errors, and so forth. That seems possible?
Quote:
Originally Posted by CalibUser
What does Calibre do with the dictionary it has compiled? Does this dictionary consist only of hyphenated words?
|
As I haven't looked at the code I'm not sure exactly what calibre does. It certain compiles a word list to fix words as your plugin does that are line-break hyphenated in the PDF, e.g. "read- ing". Such is invaluable for certain works, such as the one I made recently of a scientific work containing countless latin terms and specialized vocabulary. Perhaps it too fixes hyphenated words such as "yellow- green". I would guess such could be a fair amount of work but simpler than what you suggest. I would guess maybe it'd be useful to also keep track of number of word occurrences in case of possible source typos, picking the more common one.