![]() |
#1 |
Member
![]() Posts: 15
Karma: 10
Join Date: Apr 2011
Location: CA
Device: Nook Simple Touch, 4th Gen Kindle Basic
|
Searching for corrupted epubs?
I have about 5000 epub books and I've begun to notice that a few of them are "corrupted" (all weird characters, partial words) when I put them on my Nook and open them. I'm pretty sure it was a bad file that I downloaded from gutenberg or another place.
Is there a feature in Calibre that can search for messed up files like these so I can delete them? |
![]() |
![]() |
![]() |
#2 | |
Well trained by Cats
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 30,967
Karma: 60358908
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
Quote:
Bad OCR or Wrong Character encoding. Neither is normal PG completed book fare. Are you sure you did not get a 'Proofing Project' file? (but aren't they normally given out as partials?) PG books all have a PG Scan notes page up front. |
|
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Member
![]() Posts: 15
Karma: 10
Join Date: Apr 2011
Location: CA
Device: Nook Simple Touch, 4th Gen Kindle Basic
|
I downloaded from archive, manybooks, google books, and PG and that was years ago. I have no idea what I might have done to get these weird files. I'm thinking some of them may have come from google ebooks to be honest.
|
![]() |
![]() |
![]() |
#4 | |
Fanatic
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 515
Karma: 1470724
Join Date: Jul 2013
Location: Quebec CA
Device: android 4 (samsung tablet and asus tablet)
|
Quote:
|
|
![]() |
![]() |
![]() |
#5 | |
Zealot
![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 108
Karma: 810
Join Date: Jul 2012
Device: Kobo
|
Bulk Library Search for OCR Warning Indicators
Quote:
Having done that, and using a temporary column to label the selected books that contain those OCR warnings, the number of occurrences for those characters within any one book can be determined using the "search - Count All" feature in Sigil or Calibre's book-edit, but does anyone know of a Calibre feature that could perform a bulk determination of the number of occurrences of such character strings in that selected subset of books such that the number for each book can be stored in a temporary sort column in Calibre in order to more easily find the most problematic books? |
|
![]() |
![]() |
Advert | |
|
![]() |
#6 | |
Well trained by Cats
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 30,967
Karma: 60358908
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
Quote:
Some OS use a square, others display a box with the (utf?)code digits |
|
![]() |
![]() |
![]() |
#7 | |
Zealot
![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 108
Karma: 810
Join Date: Jul 2012
Device: Kobo
|
Quote:
Theducks, in light of your experience with Calibre, a second take-away from your response would I guess be that there doesn't seem to be any current functionality in Calibre or its add-ons that would allow a user to make a bulk determination, as per my prior email, of the number of occurrences of such character strings in the selected subset of problem books. |
|
![]() |
![]() |
![]() |
#8 |
Zealot
![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 108
Karma: 810
Join Date: Jul 2012
Device: Kobo
|
Bulk Library Search for OCR Warning Indicators (partially solved)
It turns out that the Search ePub feature under the Quality Check add-on for Calibre includes an option to "show all occurrences". That option at least indirectly provides a bulk search capability to identify the number of times a special OCR-error character like � appears in the various ePubs.
If the "show all occurrences" option is selected (maybe best at first to limit this to a subset of the books identified as having at least one occurrence), then every occurrence within that subset of ePubs is listed on a separate line in the results log and that list can be copied into something like Excel to be analyzed to identify the most problematic books. |
![]() |
![]() |
![]() |
#9 |
Zealot
![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 108
Karma: 810
Join Date: Jul 2012
Device: Kobo
|
small black square too
As per the approach described above, the following is another OCR error warning symbol that, in addition to "\^" and "�", can be used as a search criteria in Search ePub: "■".
As noted above, "search for all occurrences" can be used to bulk-determine which ePubs may have the worst problems. It's not clear as yet which version of the code-embedded white square e.g. "" might be used as a broad search criteria. |
![]() |
![]() |
![]() |
#10 |
Zealot
![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 108
Karma: 810
Join Date: Jul 2012
Device: Kobo
|
small black square - NO
My mistake.
I found that, at least for the ePubs I have encountered, searching for that small black square "■' does not really help much in finding ePubs with extensive OCR errors. When I checked ePubs with a lot of occurrences (whether or not there were also "^" warnings), that black square was almost invariably being used for other display reasons and not as an OCR error indicator. Last edited by Rob557; 02-16-2015 at 11:51 AM. |
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
0.96 corrupted? | mopgcw | Calibre | 1 | 11-14-2012 03:16 PM |
trouble when converting many epubs to epubs | comet | Conversion | 13 | 03-21-2012 01:57 AM |
Touch Problem with all epubs, my epubs, or my kobo? (line clipping) | plague006 | Kobo Reader | 14 | 12-02-2011 11:32 PM |
Searching and converting all EPUBs I have | Giuseppe Chillem | Calibre | 3 | 11-14-2011 04:57 AM |