Quote:
Originally Posted by democrite
What I mean is perhaps all the changes can be grouped into a minimum number of categories such as Common OCR errors, and so forth. That seems possible?
|
Hmm...apart from the amount of coding involved, I can see numerous different ways of grouping the corrections that the plugin makes. Some of these groupings may be OK for some users, but then other users may prefer a different set of groupings. What do other people think?
Quote:
Originally Posted by democrite
As I haven't looked at the code I'm not sure exactly what calibre does. It certain compiles a word list to fix words as your plugin does that are line-break hyphenated in the PDF, e.g. "read- ing". Such is invaluable for certain works, such as the one I made recently of a scientific work containing countless latin terms and specialized vocabulary. Perhaps it too fixes hyphenated words such as "yellow- green". I would guess such could be a fair amount of work but simpler than what you suggest. I would guess maybe it'd be useful to also keep track of number of word occurrences in case of possible source typos, picking the more common one.
|
PDF readers frequently produce the same typos when PDFing different documents, including specialised words. My plugin enables you to a set up a customised list of words that contain these typos, together with the correct word. These words with typos are then corrected automatically when the plugin runs. Although the plugin does not scan the ePub to find misspelt words, you can add these manually to the plugin's list. Please see
Using a customised list of words that are corrected automatically in the manual for the plugin.