06-18-2014, 06:12 PM | #1 |
Member
Posts: 15
Karma: 12678
Join Date: Apr 2013
Device: none
|
finding l's OCRed as I's
Using Calibre's edit book function, I would like to search for any letter l (ell) that has been incorrectly OCRed as an I (capital i), and then replace it with an l. Any easy way to do this?
I thought if I could search for "(any letter) + I", I would be able to weed out any legitimate uses of "I". Is this a good method? How would I do this in Calibre? Thanks! |
06-18-2014, 07:53 PM | #2 |
Ex-Helpdesk Junkie
Posts: 19,422
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
|
use regex mode (and lookbehinds, which do NOT capture):
Code:
(?<=[a-z])I This won't catch the illegitimate word "Iegitimate", though. Hmm, Code:
(?<![."]\s*)I I'd still eyeball each edit first -- or scan each change using "See what's changed". Last edited by eschwartz; 06-18-2014 at 08:27 PM. |
Advert | |
|
06-18-2014, 08:21 PM | #3 |
Addict
Posts: 243
Karma: 359054
Join Date: Nov 2012
Device: default
|
Assuming it is english (where only proper nouns are capitalised), then eschwartz's suggestion will catch many.
Also, for words starting with ell: PHP Code:
|
06-18-2014, 08:36 PM | #4 | |
Ex-Helpdesk Junkie
Posts: 19,422
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
|
Quote:
Code:
I |
|
06-18-2014, 09:36 PM | #5 |
Addict
Posts: 243
Karma: 359054
Join Date: Nov 2012
Device: default
|
It might just be easier to switch to a serif font (for the purposes
of seeing the wrong 'uns) and just running the spellchecker. |
Advert | |
|
06-18-2014, 11:35 PM | #6 | |
Wizard
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
Quote:
I mean, it already lists every single word in the book, why not take advantage of that? Just type a capital letter 'I' in the search, and you should be able to see every single word with the letter 'I' in it: Here is the Calibre Manual on the Spell Check functionality: http://manual.calibre-ebook.com/edit...ds-in-the-book Side Note: The Spell Check tool is also extremely helpful to catch hyphens + accented words + lots of other stuff: https://www.mobileread.com/forums/sho...84&postcount=4 https://www.mobileread.com/forums/sho...08&postcount=7 Potential Calibre Enhancement: It would be nice to also be able to toggle a Case Sensitive SEARCH. This could definitely make the Spell Check tool much more useful than Sigil's implementation. (I daresay, this might just be useful enough to make me jump ship.... maybe. ) For example, here is a book in Sigil showing off all of the misspelled (not in a dictionary) words with a lowercase 'i' and an uppercase 'I': It would be nice to see the words with ONLY the uppercase version. (For example, finding "Ice-Cream" but not "ice-cream") Last edited by Tex2002ans; 06-18-2014 at 11:47 PM. |
|
06-19-2014, 10:51 AM | #7 | |
Well trained by Cats
Posts: 29,802
Karma: 54830978
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
Quote:
A lower case word with a capital in the middle sticks out like my thumb after a misaimed hammer blow |
|
06-19-2014, 02:23 PM | #8 | |
Member
Posts: 15
Karma: 12678
Join Date: Apr 2013
Device: none
|
Quote:
|
|
06-19-2014, 02:43 PM | #9 |
Ex-Helpdesk Junkie
Posts: 19,422
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
|
|
06-19-2014, 03:35 PM | #10 | |
Wizard
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
Quote:
I typically do at least two passes when using that tool during my error checking phase. One with case sensitive sort checked, and one without. Also, you can sort by frequency. Typically an OCR error only occurs a handful of times (1-3), so you can rule out many of the more common words. |
|
06-19-2014, 07:36 PM | #11 |
Color me gone
Posts: 2,089
Karma: 1445295
Join Date: Apr 2008
Location: Central Oregon Coast
Device: PRS-300
|
search: I[a-z]
replace:l\1 I have found for this to work in calibre editor it has to be search: I([a-z]) in order to pick up the following letter. AND even though you are indicating small letters [a-z], it will pickup capital and lowercase unless case sensitive is also checked. In this way it is not exactly as you would expect. |
06-19-2014, 08:09 PM | #12 | |
Ex-Helpdesk Junkie
Posts: 19,422
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
|
Quote:
If you leave off the captured "([a-z])" then you would get the same results (except for the word "I"), and could replace with "l". Much better to look for anything-but-punctuation before the "I"... as I did in post #2. And negative lookbehind means you don't have to capture anything, you can just replace the letter itself. Last edited by eschwartz; 06-19-2014 at 08:12 PM. |
|
06-20-2014, 03:08 AM | #13 |
frumious Bandersnatch
Posts: 7,516
Karma: 18512745
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
|
An initial L is probably followed by a vowel, an initial I is probably followed by a consonant. Try searching for "I[aeiouy]".
|
06-20-2014, 06:59 AM | #14 |
Color me gone
Posts: 2,089
Karma: 1445295
Join Date: Apr 2008
Location: Central Oregon Coast
Device: PRS-300
|
That is a very good suggestion. Io is probably the only one on which it would pick up something unnecessary and unless the book is about chemistry or space, it should not be a big deal.
|
06-23-2014, 03:38 PM | #15 |
Member
Posts: 15
Karma: 12678
Join Date: Apr 2013
Device: none
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
help finding ebooks please | leala3400 | Reading Recommendations | 3 | 09-26-2013 10:42 PM |
Only Convert PDFs with embedded OCRed text to EPUB? | Geremia | Conversion | 4 | 12-24-2012 03:33 PM |
Problem with EPUB/OCRed PDF and their convertion | tuliouel | Conversion | 2 | 07-24-2012 06:38 AM |
Finding new Fantasy | thinkpad | Reading Recommendations | 14 | 09-12-2011 03:28 PM |