View Single Post
Old 11-03-2022, 09:29 AM   #2
Quoth
Still reading
Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.
 
Quoth's Avatar
 
Posts: 14,299
Karma: 105299897
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper
You can use a regex to search for lines that end with spaces, digits or letters. Then you have to look at source to see if ” ’ ! . ? is missing. A : or — is also possible to be missing. I suppose missing … , or ; or even ] is possible at the end of a line too
Lines (paragraphs) should end with </p> or <br/> (or similar)

Headings and some kinds of paragraphs (preambles, marginalia, lists) may not end with punctuation. So you need to check.

Always use wordprocessor and convert docx to epub in calibre.

An OCRed text is best proof read on eink and annotations copied back to PC.
Then edit wordprocessor source (odt format for LO Writer). Extra Save As in docx and import to calibre and convert to epub2. Then convert epub2 to any format.

I do the regex searching in LO Writer as it's the definitive edit source. I only edit image CSS in Calibre unless the source is an ebook.
Quoth is offline   Reply With Quote