11-03-2022, 05:44 AM | #1 |
Enthusiast
Posts: 31
Karma: 10
Join Date: Aug 2020
Device: Tablet
|
Check for missing Punctuation
Hi,
i have created an ebook from a scanned book. The OCR was done with FineReader and converted into ePub. After that i use Calibre Editor. I have noticed that a lot of Punctuation at the end of the sentences are missing. Is there something like a "grammer check" that can find missing Punctuation? |
11-03-2022, 09:29 AM | #2 |
the rook, bossing Never.
Posts: 11,173
Karma: 85874891
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper11
|
You can use a regex to search for lines that end with spaces, digits or letters. Then you have to look at source to see if ” ’ ! . ? is missing. A : or — is also possible to be missing. I suppose missing … , or ; or even ] is possible at the end of a line too
Lines (paragraphs) should end with </p> or <br/> (or similar) Headings and some kinds of paragraphs (preambles, marginalia, lists) may not end with punctuation. So you need to check. Always use wordprocessor and convert docx to epub in calibre. An OCRed text is best proof read on eink and annotations copied back to PC. Then edit wordprocessor source (odt format for LO Writer). Extra Save As in docx and import to calibre and convert to epub2. Then convert epub2 to any format. I do the regex searching in LO Writer as it's the definitive edit source. I only edit image CSS in Calibre unless the source is an ebook. |
Advert | |
|
11-04-2022, 03:40 PM | #3 | ||
Wizard
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
Quote:
Quote:
If you recognize some sort of basic pattern, like:
then you can always create Regular Expressions to deal with that. If you are talking punctuation always missing at the end of lines/paragraphs: Code:
<p>This is an example paragraph</p> <p>This is another sentence</p> <p>And this one has a correct period.</p> If you have missing punctuation in the middle of paragraphs:
sometimes grammarcheckers can catch this. Antidote is the best one I've run across to catch this type, but this will only catch a very small subset of all the missing punctuation. For more info on that, see my posts:
- - - Nothing beats going back to the original and figuring it out. And it's impossible to come up with a general solution, because every single book is going to have different patterns of "missing punctuation". Last edited by Tex2002ans; 11-04-2022 at 03:46 PM. |
||
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Floating punctuation (hanging punctuation) | roger64 | KOReader | 3 | 03-04-2020 06:42 AM |
Word macro to check missing quotation marks | Leonatus | Workshop | 8 | 07-29-2019 04:23 AM |
epub check failed with missing 1 | mingewang | ePub | 5 | 05-17-2019 08:39 PM |
Book Check Missing - Reset Panels? | Rand Brittain | Editor | 2 | 09-05-2017 02:49 PM |
Check library: Add missing found | compurandom | Library Management | 4 | 04-19-2017 09:30 PM |