You can use a regex to search for lines that end with spaces, digits or letters. Then you have to look at source to see if ” ’ ! . ? is missing. A : or — is also possible to be missing. I suppose missing … , or ; or even ] is possible at the end of a line too
Lines (paragraphs) should end with </p> or <br/> (or similar)
Headings and some kinds of paragraphs (preambles, marginalia, lists) may not end with punctuation. So you need to check.
Always use wordprocessor and convert docx to epub in calibre.
An OCRed text is best proof read on eink and annotations copied back to PC.
Then edit wordprocessor source (odt format for LO Writer). Extra Save As in docx and import to calibre and convert to epub2. Then convert epub2 to any format.
I do the regex searching in LO Writer as it's the definitive edit source. I only edit image CSS in Calibre unless the source is an ebook.
|