View Single Post
Old 02-13-2015, 08:42 AM   #5
Rob557
Zealot
Rob557 has learned how to read e-booksRob557 has learned how to read e-booksRob557 has learned how to read e-booksRob557 has learned how to read e-booksRob557 has learned how to read e-booksRob557 has learned how to read e-booksRob557 has learned how to read e-books
 
Posts: 108
Karma: 810
Join Date: Jul 2012
Device: Kobo
Bulk Library Search for OCR Warning Indicators

Quote:
Originally Posted by ardeur View Post
Is there a feature in Calibre that can search for messed up files like these so I can delete them?
One approach to separating out ebooks with potential problems would be to use the Search ePub option under the Quality Check add-on for Calibre. You can search your library (using Quality Check's "search scope" setting and also specifying that looking only at the text contents), for any ePub's that contain the OCR warning indicator "�". You can also search for any ePub's that contain the OCR warning caret "^" but make sure that you use the search criteria "\^" or else all your books will be identified.

Having done that, and using a temporary column to label the selected books that contain those OCR warnings, the number of occurrences for those characters within any one book can be determined using the "search - Count All" feature in Sigil or Calibre's book-edit, but does anyone know of a Calibre feature that could perform a bulk determination of the number of occurrences of such character strings in that selected subset of books such that the number for each book can be stored in a temporary sort column in Calibre in order to more easily find the most problematic books?
Rob557 is offline   Reply With Quote