View Single Post
Old 12-14-2022, 08:18 PM   #1691
The Holy
Enthusiast
The Holy began at the beginning.
 
The Holy's Avatar
 
Posts: 25
Karma: 10
Join Date: Aug 2021
Device: none
Quote:
Originally Posted by theducks View Post
took me less than 30 seconds to come up with an example that would fail:


It clearly meets your tests, but the book /movie is in English (could be either)
There will be false positives like I said, to be clear what I'm suggesting is basically an advanced search which displays the books matching the search criteria. It would never change the file or metadata on its own. Thus, the false negatives are acceptable so long as the plugin returns enough accurate results per false positive.

I would guess most people don't have more than five different languages in their library, if not only one or two, so the user could select the languages in the plugin which in turn are tied to words that make sense/ are less universal and commonly used in the language.

If a library only should consist of English and German (because it is all the person thinks exists and has been getting), the user selects English and German. That way it wouldn't match with Italian, for example, due to the words Italian may share with English or German and makes the search simpler and faster. But if none of the English or German words were found/ were found enough times, the book could be in Italian or any other language, while set as English and thus shows up in the results.

Better yet, it could check how common both languages are in Das Boot.

Basically, the user tells the plugin which languages are to be expected by selecting language presets in the plugin containing some of the most common words (or common and unique) from each language expected (The for English and Das for German for example). If a lot more of the English words are found and the language is set to English it will be assumed to be correct and not show up in the search.

There would need to be a min/max required/allowed value for the number of occurrences of words from each language preset to make it show up as a result or not. Let's say the book is set to English in Calibre. If the English words don't occur enough or the German words occur too often, it will show up in the results as a possible German book/ translation. This would be decided by the min/ max value. If it's a 50/50 split, it's an English-German Dictionary



The title/author match would work for Das Boot since the title and author should be the same in the book.
I just added both the English and German version to Calibre and ran a metadata search on both. The German one was changed to English, even though it started out correctly. Looking at the images below, it's clear the function I'm suggesting would work. It would only show the German version, which was mismatched by the metadata search as English. The images also make it clear the title and author match would have to run only on the first and last few pages, and the language match in the middle.

English version would correctly match title, author, and language:
Click image for larger version

Name:	1.png
Views:	392
Size:	15.7 KB
ID:	198398
Click image for larger version

Name:	2.png
Views:	389
Size:	23.1 KB
ID:	198399

German version would correctly match title and author, but not the language, since metadata search set it to English:
Click image for larger version

Name:	3.png
Views:	382
Size:	152.1 KB
ID:	198402
Click image for larger version

Name:	4.png
Views:	400
Size:	20.2 KB
ID:	198403
Click image for larger version

Name:	5.png
Views:	376
Size:	6.2 KB
ID:	198404

Imagine bulk adding 100 books, running metadata search and applying it. Wouldn't this be the fastest way to accurately identify most that were incorrectly identified? And 100 may be low for a lot of people, imagine doing 100s if not 1000s at a time. I have a lot of books, many of which have the wrong title, author, comment and language. Aside from covers, for which we already have tools for identifying bad ones, these four metadata values are the most important pieces of information in a book, to me anyway, which is why I think this plugin would be a great addition.

Let's say we combine it all into one plugin, here are a few advantages I can come up with:

It will show books which likely have the wrong basic (read:important) metadata!
This would in turn make using the metadata download on all books feel like less of a Hail Mary, since it will be much easier to find misidentified books.

It will show books which may not be the best copy of a book (metadata in Calibre is correct, but title and author isn't written anywhere in the book, which normally e-books should have and may indicate that it is not a good version/ copy)

It will show books which are in an unwanted language (you only select English and German because that is all you think you have, but not enough English or German words were found in a book because it's written in some other language that is different enough)
The Holy is offline   Reply With Quote