View Single Post
Old 02-13-2015, 10:05 AM   #6
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 31,123
Karma: 60406498
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Quote:
Originally Posted by Rob557 View Post
One approach to separating out ebooks with potential problems would be to use the Search ePub option under the Quality Check add-on for Calibre. You can search your library (using Quality Check's "search scope" setting and also specifying that looking only at the text contents), for any ePub's that contain the OCR warning indicator "�". You can also search for any ePub's that contain the OCR warning caret "^" but make sure that you use the search criteria "\^" or else all your books will be identified.

Having done that, and using a temporary column to label the selected books that contain those OCR warnings, the number of occurrences for those characters within any one book can be determined using the "search - Count All" feature in Sigil or Calibre's book-edit, but does anyone know of a Calibre feature that could perform a bulk determination of the number of occurrences of such character strings in that selected subset of books such that the number for each book can be stored in a temporary sort column in Calibre in order to more easily find the most problematic books?
I believe that character is substituted by the render engine to mean 1 of Many possible missing from the current character-font set.
Some OS use a square, others display a box with the (utf?)code digits
theducks is offline   Reply With Quote