|
My apologies, for such a delayed response. I didn’t know that that I had to “follow” my own thread to get notifications, I watched the thread for a couple of days and ended up working on other projects.
My knowledge of Calibre and ebook components is minuscule compared to you guys, but I “think” there are 3 different scenarios for duplicate covers. The following are notes that I had written back when I created the thread.
1.Embedded in HTML???
A 2nd cover (bad cover) shows up when viewing book, but it is not in the editor files.
Can be deleted using “Editing meta data for one book >>> Basic Metadata
1.select: "Set from ebook files"
2. The first two xhtml files will contain two different covers
A 2nd cover (bad cover) shows up when viewing book, The two covers are contained within the first two xhtml files when editing the book
Can be deleted by "Converting" the book
1. select: "Convert Books"
2. Select "Structure Detection"
3. Select "Remove First Image"
3. The two covers are located in the files "titlepage.xhtml" and "cover.xhtml". The cover name is always "cover.jpeg" (same cover) and is used in both files.
"In Calibre, a duplicate cover in an EPUB typically appears as an extra HTML file containing only the cover image, which results in the cover being displayed twice when viewing the book. This occurs because the EPUB format already specifies a cover through its metadata, and a separate HTML file for the cover creates a redundant entry.
You can remove the duplicate cover by using the Edit Book feature in Calibre and applying a regular expression in the Search and replace tool.
How a duplicate cover is embedded
When an EPUB is created or converted, a cover image is defined in the book's package file (content.opf) with an id and marked with properties=""cover-image"". Some programs, including Calibre during certain conversions, will also generate a separate HTML file (cover.xhtml) with an <img> tag pointing to the same cover image. The ebook reader then displays both the cover specified in the metadata and the cover in the HTML file, showing the same image twice. "
During this time, I found out about the option of “Removing First Image” in the Conversion process. I didn’t do a lot of sampling, but the option seemed to fix most/all of the duplicate cover issues. Having to convert all of the files is definitely not the preferred option, but most importantly some really smart person has already identified a methodology that can select ebooks with duplicate covers. My next step is to try and find the algorithm and see if it could be used in a plugin.
Yesterday I kinda glanced at the Polish feature of Calibre’s. I noticed the option to “update the cover in the book files”. I believe that this option is just updating the cover from what is stored in the metadata? How is it that the cover as listed in the html file can differ from the metadata?
Anyway, that is where I am at. Not sure when I’ll get back on it, but my plan was to develop a plugin that would at a minimum identify books with dup covers. If during the process I am able to figure out how to fix the issues, I would work towards that. As mentioned before, I don’t have a lot of background in Calibre and ebook structure so if anyone feels like taking on this type of project, I would be ecstatic!
|