recommended Sigil-Plug in: Epub Tidy Tool
The Sigil-Plugin Epub tidy Tool does a decent job a fixing incorrect line breaks. If you install the text file "IncorrectWords.txt" provided by the author, it will also fix a lot of common OCR errors.
Best to use early in the process, before the thorough proofreading.
Other Tips:
- think about what quality you want/need in the end. 80/20 applies to OCRing, you can spend way more than 80 % of your time finding the last spelling or formatting errors that don't really make a big difference to the reader. For books that I might read more than once, I tend to find myself going with fairly rough first version, highlighting problems in my Kindle (and fixing them later in Sigil), then doing another iteration before reading the book again a few years later.
- Finereader works well for me. Worth exploring the options, good settings (e.g. remove headers/footers) save a lot of fixing later
- think about what formatting you'd want to keep. OCR does a pretty lousy job if asked to preserve all formatting. You'll end up with lots of text boxes, italics, superscripts that should not be there and make a mess out of conversions.
- for fiction with no footnotes and little or no bold and italic, you could even consider converting to .txt, formatting Chapter headings in Word or Libre Office and fine-tuning ToC and page breaks in Sigil after a conversion in Calibre. And be done in a few hours.
|