|
|
#1 |
|
Member
![]() Posts: 23
Karma: 10
Join Date: Jun 2024
Device: Kindle Paperwhite
|
Removing Image File References from OPF
For the last 4 weeks I have spent an incredible number of hours working with AI to develop a Calibre Plugin that will identify books with Duplicate Covers, and provide the user an easy means of deleting these duplicate covers. The plugin code has been completed and I am currently in testing/debug mode.
The program really is amazing, there simply is nothing that can duplicate its capabilities. But having said that, as engineer I am really hoping for a better solution than what AI has developed for removing the duplicate cover. From the beginning, AI has insisted that removing the code from the OPF that references the duplicate image is too problematic. The solution that they have implemented is to simply delete the image file itself and leave the reference in the OPF. This method does work, and I am not aware of any negative consequences to this strategy. But I have to admit, I would much rather find a solution that performs a clean removal of the duplicate cover. This plugin is not for people who have small libraries, it is intended for users who have large, or very large collections. It simply is not an option to use the book editor to clean up that image and the associated reference. I know that for my collection, if I have to choose between duplicate covers, or some stray code, I will choose the removal of duplicate covers. I do know that converting the book can fix the OPF file, but converting has potential issues also. I've looked at the capabilities of Polish, and the available plugins and I don't see a solution from that side. So, I am hoping that someone can provide suggestions for achieving a clean removal. |
|
|
|
|
|
#2 |
|
null operator (he/him)
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 22,092
Karma: 30277960
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
Did you look at the code for the "Remove missing file entries from manifest" option in the Modify EPUB plugin?
BR |
|
|
|
| Advert | |
|
|
|
|
#3 |
|
Member
![]() Posts: 23
Karma: 10
Join Date: Jun 2024
Device: Kindle Paperwhite
|
|
|
|
|
|
|
#4 |
|
Resident Curmudgeon
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 81,315
Karma: 150263711
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
Kindle eBooks in KF8 do have a duplicate cover. One is the cover and the other is a smaller version which is mot likely the thumbnail. That's intentional. I've seen very few ePub with a duplicate cover.
|
|
|
|
|
|
#5 | |
|
null operator (he/him)
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 22,092
Karma: 30277960
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
Quote:
My thinking was that you could 'copy/paste/adjust" the code from Modify EPUB that implements the Remove missing file entries from manifest option, such that it gets executed after the AI generated code removes the duplicate image. BR |
|
|
|
|
| Advert | |
|
|
|
|
#6 |
|
Member
![]() Posts: 23
Karma: 10
Join Date: Jun 2024
Device: Kindle Paperwhite
|
If we are talking about retail epub's, I 100% agree with you! But most duplicates covers are caused by books going thru a conversion. I have no idea what program is causing it, but that program creates a "titlepage.html" file that they include first in the spine, and it creates a "Cover.html" which is second in the spine. Included in the both html files is the "same" image file "cover.jpeg", of course resulting in a duplicate cover. In the program the goal was to get as many duplicate covers into a category that the user could auto-delete. We call that category "Identical" and it represents 80% of the total duplicates. What is really fascinating is that every dup image is visually identical to the cover, though the analytics that define the characteristics of the image can vary somewhat. The only explanation is the conversion process.
|
|
|
|
|
|
#7 | |
|
Member
![]() Posts: 23
Karma: 10
Join Date: Jun 2024
Device: Kindle Paperwhite
|
Quote:
Thanks BR!
|
|
|
|
|
|
|
#8 |
|
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,724
Karma: 28549306
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Not sure what AI you were using but after getting the file name, this is literally 5 lines of code.
Code:
from calibre.ebooks.oeb.polish import get_container from calibre.ebooks.oeb.polish.replace import remove_links_to c = get_container(path, tweak_mode=True) def predicate(name, href, fragment=None): return name == fname_to_remove remove_links_to(c, c.remove_item(name_to_remove) c.commit(path_to_save_to) |
|
|
|
|
|
#9 | |
|
Resident Curmudgeon
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 81,315
Karma: 150263711
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
Quote:
|
|
|
|
|
|
|
#10 | |
|
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 6,283
Karma: 16800000
Join Date: Sep 2009
Location: UK
Device: ClaraHD, Forma, Libra2, Clara2E, LibraCol, PBTouchHD3
|
Quote:
I've been using the calibre get_container function for many years and I'm slightly concerned by this tweak_mode=True parameter. I'm pretty sure I've never used it (at least not knowingly). Yet I have, over the years, written various plugins (User and Editor) which appeared to work OK. ![]() What have I missed? Can you explain, in simple terms, when it should be used and when it's OK to leave it as default tweak_mode=False. Many thanks. |
|
|
|
|
|
|
#11 | |
|
Member
![]() Posts: 23
Karma: 10
Join Date: Jun 2024
Device: Kindle Paperwhite
|
Quote:
Up until recently, I had been using Gemini Pro and ChatGPT Pro interchangeably. Since Gemini upgraded to 3.0, I have switched to Gemini. Unfortunately, neither Gemini or ChatGPT is ever successful converting a script file to a plugin. For all of my previous projects I have relied on script files ran thru the Calibre Debug program, which works fine. For this program, I tried Grok for the first time and they were able to create a Plugin in about 10 minutes. |
|
|
|
|
|
|
#12 |
|
Member
![]() Posts: 23
Karma: 10
Join Date: Jun 2024
Device: Kindle Paperwhite
|
It is two set of code in the example above. But in the other duplicates, we do commonly see that "identical" image files are being created. If I remember correctly, AI was using 6 different apps that assessed different characteristics of the images and based on those analytics, you could tell that the 2nd image originated from the cover file.
|
|
|
|
|
|
#13 |
|
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,724
Karma: 28549306
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
@jackie_w: tweak_mode is what the edit book tool uses. Polish Books does not. tweak_mode is generally less forgiving of bad input and does less auto correction of syntx errors andthe like when parsing things. Whether you use it or not depends on whether you want autocorrection or not.
|
|
|
|
|
|
#14 | |
|
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 6,283
Karma: 16800000
Join Date: Sep 2009
Location: UK
Device: ClaraHD, Forma, Libra2, Clara2E, LibraCol, PBTouchHD3
|
Quote:
Merry Christmas
|
|
|
|
|
![]() |
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Removing Cover Image From MOBI File | AuthorGreg | Kindle Formats | 0 | 11-02-2013 05:39 PM |
| TWO OPF files or something like Media Queries in the OPF file for KF8 and MOBI? | DHahn | Kindle Formats | 3 | 04-17-2012 05:06 AM |
| Need Help removing image file | Cpl Punishment | Nook Developer's Corner | 1 | 10-08-2011 07:50 AM |
| Removing class and id references | Artha | Sigil | 10 | 07-24-2011 12:17 PM |
| Removing files, cleaning up references | Artha | Sigil | 11 | 07-19-2011 12:05 PM |