Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Editor

Notices

Reply
 
Thread Tools Search this Thread
Old 11-03-2022, 05:44 AM   #1
abraum
Enthusiast
abraum began at the beginning.
 
Posts: 31
Karma: 10
Join Date: Aug 2020
Device: Tablet
Check for missing Punctuation

Hi,

i have created an ebook from a scanned book. The OCR was done with FineReader and converted into ePub. After that i use Calibre Editor. I have noticed that a lot of Punctuation at the end of the sentences are missing. Is there something like a "grammer check" that can find missing Punctuation?
abraum is offline   Reply With Quote
Old 11-03-2022, 09:29 AM   #2
Quoth
the rook, bossing Never.
Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.
 
Quoth's Avatar
 
Posts: 11,173
Karma: 85874891
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper11
You can use a regex to search for lines that end with spaces, digits or letters. Then you have to look at source to see if ” ’ ! . ? is missing. A : or — is also possible to be missing. I suppose missing … , or ; or even ] is possible at the end of a line too
Lines (paragraphs) should end with </p> or <br/> (or similar)

Headings and some kinds of paragraphs (preambles, marginalia, lists) may not end with punctuation. So you need to check.

Always use wordprocessor and convert docx to epub in calibre.

An OCRed text is best proof read on eink and annotations copied back to PC.
Then edit wordprocessor source (odt format for LO Writer). Extra Save As in docx and import to calibre and convert to epub2. Then convert epub2 to any format.

I do the regex searching in LO Writer as it's the definitive edit source. I only edit image CSS in Calibre unless the source is an ebook.
Quoth is offline   Reply With Quote
Advert
Old 11-04-2022, 03:40 PM   #3
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by abraum View Post
Is there something like a "grammer check" that can find missing Punctuation?
No. But there are some ways to speed this along.

Quote:
Originally Posted by abraum View Post
i have created an ebook from a scanned book. The OCR was done with FineReader and converted into ePub. After that i use Calibre Editor. I have noticed that a lot of Punctuation at the end of the sentences are missing.
The best way is to go back to Finereader and visually see exactly what was in the original. It's the only way you'd be able to tell what the missing character actually is, instead of blindly guessing.

If you recognize some sort of basic pattern, like:
  • vol 1 -> vol. 1
  • pp 123–130 -> pp. 123–130
  • From 19992002 -> From 1999–2002

then you can always create Regular Expressions to deal with that.

If you are talking punctuation always missing at the end of lines/paragraphs:

Code:
<p>This is an example paragraph</p>

<p>This is another sentence</p>

<p>And this one has a correct period.</p>
then again, you can use Regular Expressions for that. See my recent posts in:

If you have missing punctuation in the middle of paragraphs:
  • This is an example And a second sentence And a third sentence.

sometimes grammarcheckers can catch this.

Antidote is the best one I've run across to catch this type, but this will only catch a very small subset of all the missing punctuation.

For more info on that, see my posts:

- - -

Nothing beats going back to the original and figuring it out.

And it's impossible to come up with a general solution, because every single book is going to have different patterns of "missing punctuation".

Last edited by Tex2002ans; 11-04-2022 at 03:46 PM.
Tex2002ans is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Floating punctuation (hanging punctuation) roger64 KOReader 3 03-04-2020 06:42 AM
Word macro to check missing quotation marks Leonatus Workshop 8 07-29-2019 04:23 AM
epub check failed with missing 1 mingewang ePub 5 05-17-2019 08:39 PM
Book Check Missing - Reset Panels? Rand Brittain Editor 2 09-05-2017 02:49 PM
Check library: Add missing found compurandom Library Management 4 04-19-2017 09:30 PM


All times are GMT -4. The time now is 09:32 PM.


MobileRead.com is a privately owned, operated and funded community.