View Single Post
Old 12-14-2022, 10:58 AM   #1689
The Holy
Enthusiast
The Holy began at the beginning.
 
The Holy's Avatar
 
Posts: 25
Karma: 10
Join Date: Aug 2021
Device: none
I have two ideas for plugins to identify books with incorrect metadata.

Misidentified check:
A plugin that runs full text search on all books in text based formats
It matches the title and last name of the author and makes sure an exact match exists inside the book.
If multiple authors exist one last name match from any of them is enough but the title must always match exactly. Case agnostic.
This will find many if not all misidentified books. Some false positives can be expected.

Language check:
Compare the language that is set for each book to its actual contents => only for text based formats
and
Compare the language that is set for each book to what languages are used in the title and comments.
For example by looking for non-english characters and words in title or comments when a book is set to language: English
E.g. The, Der, Die, Das, La, Le, Il, Å, Ä, Ö, Æ, 诶, ēi, も, अ, ب. Perhaps only do the most common languages if it gets to be too complicated.
Perhaps include a setting for minimum matches per page/number of words and/or matches total per book to avoid false positives.
And perhaps only check first 10, 10 in the middle and last 5 pages.
Dictionaries may be a frequent false positive.

Maybe these would be best combined into one plugin so that it checks the language is the same in metadata and the book as well as matching the author and title.
"Misidentified check" or "Fix match" for example.
Or perhaps be added to a plugin like quality check?

Last edited by The Holy; 12-14-2022 at 11:05 AM.
The Holy is offline   Reply With Quote