![]() |
#1681 |
Resident Curmudgeon
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 80,069
Karma: 147983159
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
I have never seen a novel get a new ISBN. There either is a version number or not.
|
![]() |
![]() |
![]() |
#1682 | |
Resident Curmudgeon
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 80,069
Karma: 147983159
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
Quote:
Code:
<p class="x01-FM-Copyright-Text-Space" id="release_identifier_line">btb_ppg_141035760_c0_r7</p> |
|
![]() |
![]() |
Advert | |
|
![]() |
#1683 | |
Well trained by Cats
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 31,141
Karma: 60406498
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
Quote:
The last character is a Check Digit (Mod 11) |
|
![]() |
![]() |
![]() |
#1684 |
Enthusiast
![]() Posts: 34
Karma: 10
Join Date: Aug 2022
Device: Windows 10
|
I'm wondering if there is someone who can develop a plugin that convert books to mp3 or m4a.
I've been looking for a way to listen to books using read-aloud. Unfortunately, Windows 10's native system TTS voices are too artificial and uncomfortable to listen to. I tried Kindle and Google Play Books read-aloud features but I find Microsoft Edge's native read -aloud voices are the most natural and acceptable. So currently, I convert books to htmlz and unzip them and copy those files to Android devices and make Edge on Android read the content of index.html. Edge often fails to read sentences in PDF files possibly due to internal mark-ups and the Android version of Edge cannot read PDFs. Also, the Android version of Edge gets stalled while reading long TXT files so it seems converting books to HTML is the way to go at the moment. But this is cumbersome. I've researched a little and Microsoft Edge seems to use the Microsoft Azure Text-to-speech technology, and it is available via API for free. If somebody can develop such a plugin that convert books to audio files using Microsoft Azure TTS API, it would be greatly appreciated. |
![]() |
![]() |
![]() |
#1685 | |
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 394
Karma: 6700000
Join Date: Jan 2012
Location: Gimel
Device: tablets
|
Quote:
https://www.mobileread.com/forums/sh...d.php?t=299727 Unfortunately, the developer is not longer able to provide ongoing support. The hope is that it will work for Calibre 5.x. I don't do Windows, so I can't tell you how well it works. |
|
![]() |
![]() |
Advert | |
|
![]() |
#1686 | |
Enthusiast
![]() Posts: 34
Karma: 10
Join Date: Aug 2022
Device: Windows 10
|
Quote:
|
|
![]() |
![]() |
![]() |
#1687 | |
Connoisseur
![]() ![]() ![]() ![]() ![]() ![]() Posts: 62
Karma: 666
Join Date: May 2020
Location: Germany
Device: android smartphone + tablet
|
Quote:
|
|
![]() |
![]() |
![]() |
#1688 |
Enthusiast
![]() Posts: 34
Karma: 10
Join Date: Aug 2022
Device: Windows 10
|
Idea: Cascaded series book thumbnails in the grid view
Series of books occupy the physical area in the grid view and can slow down showing books as Calibre takes a bit of time to load thumbnails. So, it might be a good idea to group those series of books in one thumbnail with a cascaded image (like overlapped Solitaire cards). Then when the thumbnail is clicked, it shows the search result of the series. |
![]() |
![]() |
![]() |
#1689 |
Enthusiast
![]() Posts: 25
Karma: 10
Join Date: Aug 2021
Device: none
|
I have two ideas for plugins to identify books with incorrect metadata.
Misidentified check: A plugin that runs full text search on all books in text based formats It matches the title and last name of the author and makes sure an exact match exists inside the book. If multiple authors exist one last name match from any of them is enough but the title must always match exactly. Case agnostic. This will find many if not all misidentified books. Some false positives can be expected. Language check: Compare the language that is set for each book to its actual contents => only for text based formats and Compare the language that is set for each book to what languages are used in the title and comments. For example by looking for non-english characters and words in title or comments when a book is set to language: English E.g. The, Der, Die, Das, La, Le, Il, Å, Ä, Ö, Æ, 诶, ēi, も, अ, ب. Perhaps only do the most common languages if it gets to be too complicated. Perhaps include a setting for minimum matches per page/number of words and/or matches total per book to avoid false positives. And perhaps only check first 10, 10 in the middle and last 5 pages. Dictionaries may be a frequent false positive. Maybe these would be best combined into one plugin so that it checks the language is the same in metadata and the book as well as matching the author and title. "Misidentified check" or "Fix match" for example. Or perhaps be added to a plugin like quality check? Last edited by The Holy; 12-14-2022 at 11:05 AM. |
![]() |
![]() |
![]() |
#1690 | ||
Well trained by Cats
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 31,141
Karma: 60406498
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
Quote:
Quote:
|
||
![]() |
![]() |
![]() |
#1691 | |
Enthusiast
![]() Posts: 25
Karma: 10
Join Date: Aug 2021
Device: none
|
Quote:
I would guess most people don't have more than five different languages in their library, if not only one or two, so the user could select the languages in the plugin which in turn are tied to words that make sense/ are less universal and commonly used in the language. If a library only should consist of English and German (because it is all the person thinks exists and has been getting), the user selects English and German. That way it wouldn't match with Italian, for example, due to the words Italian may share with English or German and makes the search simpler and faster. But if none of the English or German words were found/ were found enough times, the book could be in Italian or any other language, while set as English and thus shows up in the results. Better yet, it could check how common both languages are in Das Boot. Basically, the user tells the plugin which languages are to be expected by selecting language presets in the plugin containing some of the most common words (or common and unique) from each language expected (The for English and Das for German for example). If a lot more of the English words are found and the language is set to English it will be assumed to be correct and not show up in the search. There would need to be a min/max required/allowed value for the number of occurrences of words from each language preset to make it show up as a result or not. Let's say the book is set to English in Calibre. If the English words don't occur enough or the German words occur too often, it will show up in the results as a possible German book/ translation. This would be decided by the min/ max value. If it's a 50/50 split, it's an English-German Dictionary ![]() The title/author match would work for Das Boot since the title and author should be the same in the book. I just added both the English and German version to Calibre and ran a metadata search on both. The German one was changed to English, even though it started out correctly. Looking at the images below, it's clear the function I'm suggesting would work. It would only show the German version, which was mismatched by the metadata search as English. The images also make it clear the title and author match would have to run only on the first and last few pages, and the language match in the middle. English version would correctly match title, author, and language: German version would correctly match title and author, but not the language, since metadata search set it to English: Imagine bulk adding 100 books, running metadata search and applying it. Wouldn't this be the fastest way to accurately identify most that were incorrectly identified? And 100 may be low for a lot of people, imagine doing 100s if not 1000s at a time. I have a lot of books, many of which have the wrong title, author, comment and language. Aside from covers, for which we already have tools for identifying bad ones, these four metadata values are the most important pieces of information in a book, to me anyway, which is why I think this plugin would be a great addition. Let's say we combine it all into one plugin, here are a few advantages I can come up with: It will show books which likely have the wrong basic (read:important) metadata! This would in turn make using the metadata download on all books feel like less of a Hail Mary, since it will be much easier to find misidentified books. It will show books which may not be the best copy of a book (metadata in Calibre is correct, but title and author isn't written anywhere in the book, which normally e-books should have and may indicate that it is not a good version/ copy) It will show books which are in an unwanted language (you only select English and German because that is all you think you have, but not enough English or German words were found in a book because it's written in some other language that is different enough) |
|
![]() |
![]() |
![]() |
#1692 | ||||
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 394
Karma: 6700000
Join Date: Jan 2012
Location: Gimel
Device: tablets
|
Quote:
CF https://www.goodreads.com/book/show/...6-dead-letters Quote:
Quote:
The most common utilize letter frequency tables. The least common utilize both word frequency and letter frequency tables. Depending upon how the program and database is structured, adding a new language can be as easy as dropping a new, language-specific database in a specific folder, and telling the program what the language is, or as complicated as adding new fields to the database, replacing the old database with the new, updated database. Quote:
a) It isn't uncommon for official documents from either governments, or NGOs to be in two or more languages. b) Databases of word frequency tables can become very large, very quickly. ### _Ethnologue_ claims that there are 7151 spoken languages today, with 4169 having a developed writing system, and a further 151 languages that are exclusively signed. _Wycliffe Bible Translators_ claims that there are 7388 spoken languages, or which the Bible has been fully translated into 724 languages, and 3266 languages have an ongoing translation project. For various reasons, I put slightly greater credence on _Wycliffe Bible Translators_ data, than on Ethnologue data. For a first cut plug-in, I'd mandate UTF-8 glyphs, and use them to break the book into writing system, and from that, use letter frequencies for the specific language. The virtue of this approach is that it can guess the language of any document thrown at it, with an acceptable degree of inaccuracy. Either a second plug-in, or a more advanced version, would use word frequencies, with an initial draft of English/not-English, then expand to the ten most common languages, and when that is bug free, go to the 20 most common languages, and then jump to 50, 100, and, maybe 200 most common spoken languages. Third parties willing to provide letter and/or word frequency tables would enable faster expansion and inclusion of minority and/or endangered and/or extinct and/or conlangs than would otherwise be the case. |
||||
![]() |
![]() |
![]() |
#1693 | ||
Enthusiast
![]() Posts: 25
Karma: 10
Join Date: Aug 2021
Device: none
|
Quote:
Quote:
Algorithms or a system that could identify any language out of the box would be interesting to test if it already exists. I do wonder, however, what the feasibility of that approach would be in terms of complexity and compute intensity. I agree we should start small before expanding to multiple languages, perhaps just English and one other. A basic plugin would be great to start testing. |
||
![]() |
![]() |
![]() |
#1694 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,017
Karma: 500000
Join Date: Jun 2015
Device: Rocketbook, kobo aura h2o, kobo forma, kobo libra color
|
> b) Databases of word frequency tables can become very large, very quickly.
I wouldn't think you need a complete dictionary to do this. I would expect that having a dictionary of, say, the top 400 words in a language would be plenty to characterize it. If you were selective, you could probably even pick less than 50 "keystone" words that are not shared with other languages, or at least very frequent in one language and very infrequent in other languages and come up with a correct weighted answer. I'd even guess (i.e., without research or evidence) that given two languages, you could pick 10 words in each that would distinguish a text between the two using a weighted frequency sample of a few pages randomly selected in the book (i.e., page 10, not page 1, and a page full of words, not pictures). I'm sure in the hundreds to thousands of potential languages, you could probably come up with a small number of words that would assign a book to a language family, and then go down a decision tree to narrow down which one from the family. Even without having a database, it should be possible to analyze a book, generate a frequency table of the top ~1000 words, have the user supply the language, and build a database. After adding a handful of languages like this, you could start characterizing books and for ones that are wrong, it could generate a differential between the two languages. A user guided selection of words might be useful and improve accuracy, but likely not totally necessary. Last edited by compurandom; 12-15-2022 at 08:06 PM. |
![]() |
![]() |
![]() |
#1695 | ||||
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 394
Karma: 6700000
Join Date: Jan 2012
Location: Gimel
Device: tablets
|
Quote:
![]() What you don't want to happen, is what happened with the Afrikaans dictionary for OpenOffice.org. The final, automated proofreading, was running it against the South African English dictionary, and deleting words found in that dictionary. There was a list of words to be added back in --- "boer", "bakkie", other obvious Afrikaans words that English captured --- but the word "die" took almost a decade to migrate into that "add word back in list". "Die" is Afrikaans for "The". Quote:
Quote:
I've forgotten where in the LibreOffice codebase their implementation resides. ![]() The algorithm LibreOffice uses is neither complex, nor computer intense. I learned to program using "If Then" & GoTo statements. (Standard Library? What is that? ) If the wanted algorithm wasn't in either Knuth's _The Art of Computer Programming_ or Sedgewick, brute force a working solution. An approach that is guaranteed to produce umpteen bugs per line of code. Once a working version exists, throw it all away, and write the program using procedures and functions. Quote:
### After thinking some more about it, I'd push for two plugins. One glyph/letter based, and one word based. The former for rough identification and the latter for precise identification. |
||||
![]() |
![]() |
![]() |
Tags |
calibre, chatbot, cover, epub fix, epub-fix, google books, kindle, metadata calibre title, missing, pdf, pdf and calibre, plugin development, scribe |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
PRS-505 Any ideas what this might be? | Neupy | Sony Reader | 4 | 07-03-2012 07:19 AM |
New Plugin Type Idea: Library Plugin | cgranade | Plugins | 3 | 09-15-2010 12:11 PM |
Ideas? | mike_bike_kite | Which one should I buy? | 10 | 06-13-2010 03:37 PM |
Ideas | F1Wild | Amazon Kindle | 4 | 07-10-2009 06:01 AM |