Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > ePub

Notices

Reply
 
Thread Tools Search this Thread
Old 06-18-2014, 06:12 PM   #1
jlocicero
Member
jlocicero can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterjlocicero can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterjlocicero can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterjlocicero can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterjlocicero can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterjlocicero can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterjlocicero can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterjlocicero can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterjlocicero can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterjlocicero can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterjlocicero can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameter
 
Posts: 15
Karma: 12678
Join Date: Apr 2013
Device: none
finding l's OCRed as I's

Using Calibre's edit book function, I would like to search for any letter l (ell) that has been incorrectly OCRed as an I (capital i), and then replace it with an l. Any easy way to do this?

I thought if I could search for "(any letter) + I", I would be able to weed out any legitimate uses of "I". Is this a good method? How would I do this in Calibre?

Thanks!
jlocicero is offline   Reply With Quote
Old 06-18-2014, 07:53 PM   #2
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,422
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
use regex mode (and lookbehinds, which do NOT capture):

Code:
(?<=[a-z])I
Replace with "l"
This won't catch the illegitimate word "Iegitimate", though.

Hmm,

Code:
(?<![."]\s*)I
That should match any "I" that does NOT follow punctuation (even if there is an intervening space).

I'd still eyeball each edit first -- or scan each change using "See what's changed".

Last edited by eschwartz; 06-18-2014 at 08:27 PM.
eschwartz is offline   Reply With Quote
Advert
Old 06-18-2014, 08:21 PM   #3
DomesticExtremis
Addict
DomesticExtremis ought to be getting tired of karma fortunes by now.DomesticExtremis ought to be getting tired of karma fortunes by now.DomesticExtremis ought to be getting tired of karma fortunes by now.DomesticExtremis ought to be getting tired of karma fortunes by now.DomesticExtremis ought to be getting tired of karma fortunes by now.DomesticExtremis ought to be getting tired of karma fortunes by now.DomesticExtremis ought to be getting tired of karma fortunes by now.DomesticExtremis ought to be getting tired of karma fortunes by now.DomesticExtremis ought to be getting tired of karma fortunes by now.DomesticExtremis ought to be getting tired of karma fortunes by now.DomesticExtremis ought to be getting tired of karma fortunes by now.
 
DomesticExtremis's Avatar
 
Posts: 243
Karma: 359054
Join Date: Nov 2012
Device: default
Assuming it is english (where only proper nouns are capitalised), then eschwartz's suggestion will catch many.

Also, for words starting with ell:

PHP Code:
searchI[a-z]
replace:l\
It means two searches
DomesticExtremis is offline   Reply With Quote
Old 06-18-2014, 08:36 PM   #4
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,422
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
Quote:
Originally Posted by DomesticExtremis View Post
Assuming it is english (where only proper nouns are capitalised), then eschwartz's suggestion will catch many.

Also, for words starting with ell:

PHP Code:
searchI[a-z]
replace:l\
It means two searches
I have updated my regex to match "I" when not following punctuation, which is better than doing two searches that together equal
Code:
I
eschwartz is offline   Reply With Quote
Old 06-18-2014, 09:36 PM   #5
DomesticExtremis
Addict
DomesticExtremis ought to be getting tired of karma fortunes by now.DomesticExtremis ought to be getting tired of karma fortunes by now.DomesticExtremis ought to be getting tired of karma fortunes by now.DomesticExtremis ought to be getting tired of karma fortunes by now.DomesticExtremis ought to be getting tired of karma fortunes by now.DomesticExtremis ought to be getting tired of karma fortunes by now.DomesticExtremis ought to be getting tired of karma fortunes by now.DomesticExtremis ought to be getting tired of karma fortunes by now.DomesticExtremis ought to be getting tired of karma fortunes by now.DomesticExtremis ought to be getting tired of karma fortunes by now.DomesticExtremis ought to be getting tired of karma fortunes by now.
 
DomesticExtremis's Avatar
 
Posts: 243
Karma: 359054
Join Date: Nov 2012
Device: default
It might just be easier to switch to a serif font (for the purposes
of seeing the wrong 'uns) and just running the spellchecker.
DomesticExtremis is offline   Reply With Quote
Advert
Old 06-18-2014, 11:35 PM   #6
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by jlocicero View Post
I thought if I could search for "(any letter) + I", I would be able to weed out any legitimate uses of "I". Is this a good method? How would I do this in Calibre?
I think a much more efficient way to do it would just be to take advantage of the "Spell Check" functionality. (I use this ALL THE TIME in Sigil).

I mean, it already lists every single word in the book, why not take advantage of that?

Just type a capital letter 'I' in the search, and you should be able to see every single word with the letter 'I' in it:

Here is the Calibre Manual on the Spell Check functionality:

http://manual.calibre-ebook.com/edit...ds-in-the-book

Side Note: The Spell Check tool is also extremely helpful to catch hyphens + accented words + lots of other stuff:

https://www.mobileread.com/forums/sho...84&postcount=4
https://www.mobileread.com/forums/sho...08&postcount=7

Potential Calibre Enhancement: It would be nice to also be able to toggle a Case Sensitive SEARCH. This could definitely make the Spell Check tool much more useful than Sigil's implementation. (I daresay, this might just be useful enough to make me jump ship.... maybe. )

For example, here is a book in Sigil showing off all of the misspelled (not in a dictionary) words with a lowercase 'i' and an uppercase 'I':

Click image for larger version

Name:	SpellcheckI.png
Views:	404
Size:	10.6 KB
ID:	124343

It would be nice to see the words with ONLY the uppercase version. (For example, finding "Ice-Cream" but not "ice-cream")

Last edited by Tex2002ans; 06-18-2014 at 11:47 PM.
Tex2002ans is offline   Reply With Quote
Old 06-19-2014, 10:51 AM   #7
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 29,802
Karma: 54830978
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Quote:
Originally Posted by Tex2002ans View Post
I think a much more efficient way to do it would just be to take advantage of the "Spell Check" functionality. (I use this ALL THE TIME in Sigil).

I mean, it already lists every single word in the book, why not take advantage of that?

Just type a capital letter 'I' in the search, and you should be able to see every single word with the letter 'I' in it:

Here is the Calibre Manual on the Spell Check functionality:

http://manual.calibre-ebook.com/edit...ds-in-the-book

Side Note: The Spell Check tool is also extremely helpful to catch hyphens + accented words + lots of other stuff:

https://www.mobileread.com/forums/sho...84&postcount=4
https://www.mobileread.com/forums/sho...08&postcount=7

Potential Calibre Enhancement: It would be nice to also be able to toggle a Case Sensitive SEARCH. This could definitely make the Spell Check tool much more useful than Sigil's implementation. (I daresay, this might just be useful enough to make me jump ship.... maybe. )

For example, here is a book in Sigil showing off all of the misspelled (not in a dictionary) words with a lowercase 'i' and an uppercase 'I':

Attachment 124343

It would be nice to see the words with ONLY the uppercase version. (For example, finding "Ice-Cream" but not "ice-cream")
I just tick the case sensitive sort (Sigil and Calibre)
A lower case word with a capital in the middle sticks out like my thumb after a misaimed hammer blow
theducks is online now   Reply With Quote
Old 06-19-2014, 02:23 PM   #8
jlocicero
Member
jlocicero can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterjlocicero can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterjlocicero can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterjlocicero can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterjlocicero can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterjlocicero can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterjlocicero can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterjlocicero can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterjlocicero can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterjlocicero can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterjlocicero can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameter
 
Posts: 15
Karma: 12678
Join Date: Apr 2013
Device: none
Quote:
Originally Posted by eschwartz View Post
use regex mode (and lookbehinds, which do NOT capture):

Code:
(?<=[a-z])I
Replace with "l"
This won't catch the illegitimate word "Iegitimate", though.

Hmm,

Code:
(?<![."]\s*)I
That should match any "I" that does NOT follow punctuation (even if there is an intervening space).

I'd still eyeball each edit first -- or scan each change using "See what's changed".
I just tried both of these, and they find all i's in the text. I made sure Regex mode was selected. What have I done wrong?
jlocicero is offline   Reply With Quote
Old 06-19-2014, 02:43 PM   #9
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,422
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
Quote:
Originally Posted by jlocicero View Post
I just tried both of these, and they find all i's in the text. I made sure Regex mode was selected. What have I done wrong?
I am guessing "case sensitive" is not checked.

By default it seems to be off.
eschwartz is offline   Reply With Quote
Old 06-19-2014, 03:35 PM   #10
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by theducks View Post
I just tick the case sensitive sort (Sigil and Calibre)
A lower case word with a capital in the middle sticks out like my thumb after a misaimed hammer blow
Just visualizing the words in a different way helps you spot errors! (sort ascending, sort descending, sort by frequency, sort by case, etc. etc.)

I typically do at least two passes when using that tool during my error checking phase. One with case sensitive sort checked, and one without.

Also, you can sort by frequency. Typically an OCR error only occurs a handful of times (1-3), so you can rule out many of the more common words.
Tex2002ans is offline   Reply With Quote
Old 06-19-2014, 07:36 PM   #11
mrmikel
Color me gone
mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.
 
Posts: 2,089
Karma: 1445295
Join Date: Apr 2008
Location: Central Oregon Coast
Device: PRS-300
search: I[a-z]
replace:l\1

I have found for this to work in calibre editor it has to be search: I([a-z])
in order to pick up the following letter.

AND even though you are indicating small letters [a-z], it will pickup capital and lowercase unless case sensitive is also checked. In this way it is not exactly as you would expect.
mrmikel is offline   Reply With Quote
Old 06-19-2014, 08:09 PM   #12
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,422
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
Quote:
Originally Posted by mrmikel View Post
search: I[a-z]
replace:l\1

I have found for this to work in calibre editor it has to be search: I([a-z])
in order to pick up the following letter.

AND even though you are indicating small letters [a-z], it will pickup capital and lowercase unless case sensitive is also checked. In this way it is not exactly as you would expect.
That will still pick up every single instance of a capital "I" in the book, as long as it isn't the actual word "I".

If you leave off the captured "([a-z])" then you would get the same results (except for the word "I"), and could replace with "l".

Much better to look for anything-but-punctuation before the "I"... as I did in post #2.

And negative lookbehind means you don't have to capture anything, you can just replace the letter itself.

Last edited by eschwartz; 06-19-2014 at 08:12 PM.
eschwartz is offline   Reply With Quote
Old 06-20-2014, 03:08 AM   #13
Jellby
frumious Bandersnatch
Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.
 
Jellby's Avatar
 
Posts: 7,516
Karma: 18512745
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
An initial L is probably followed by a vowel, an initial I is probably followed by a consonant. Try searching for "I[aeiouy]".
Jellby is offline   Reply With Quote
Old 06-20-2014, 06:59 AM   #14
mrmikel
Color me gone
mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.
 
Posts: 2,089
Karma: 1445295
Join Date: Apr 2008
Location: Central Oregon Coast
Device: PRS-300
That is a very good suggestion. Io is probably the only one on which it would pick up something unnecessary and unless the book is about chemistry or space, it should not be a big deal.
mrmikel is offline   Reply With Quote
Old 06-23-2014, 03:38 PM   #15
jlocicero
Member
jlocicero can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterjlocicero can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterjlocicero can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterjlocicero can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterjlocicero can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterjlocicero can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterjlocicero can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterjlocicero can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterjlocicero can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterjlocicero can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterjlocicero can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameter
 
Posts: 15
Karma: 12678
Join Date: Apr 2013
Device: none
Quote:
Originally Posted by eschwartz View Post
I am guessing "case sensitive" is not checked.

By default it seems to be off.
Yes, my mistake, and now it works.

Thanks for all the suggestion! I've gotten better results than I hoped for.
jlocicero is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
help finding ebooks please leala3400 Reading Recommendations 3 09-26-2013 10:42 PM
Only Convert PDFs with embedded OCRed text to EPUB? Geremia Conversion 4 12-24-2012 03:33 PM
Problem with EPUB/OCRed PDF and their convertion tuliouel Conversion 2 07-24-2012 06:38 AM
Finding new Fantasy thinkpad Reading Recommendations 14 09-12-2011 03:28 PM


All times are GMT -4. The time now is 04:42 PM.


MobileRead.com is a privately owned, operated and funded community.