Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Development

Notices

Reply
 
Thread Tools Search this Thread
Old 12-22-2025, 05:48 PM   #1
Trester99
Member
Trester99 began at the beginning.
 
Posts: 23
Karma: 10
Join Date: Jun 2024
Device: Kindle Paperwhite
Removing Image File References from OPF

For the last 4 weeks I have spent an incredible number of hours working with AI to develop a Calibre Plugin that will identify books with Duplicate Covers, and provide the user an easy means of deleting these duplicate covers. The plugin code has been completed and I am currently in testing/debug mode.

The program really is amazing, there simply is nothing that can duplicate its capabilities. But having said that, as engineer I am really hoping for a better solution than what AI has developed for removing the duplicate cover. From the beginning, AI has insisted that removing the code from the OPF that references the duplicate image is too problematic. The solution that they have implemented is to simply delete the image file itself and leave the reference in the OPF. This method does work, and I am not aware of any negative consequences to this strategy. But I have to admit, I would much rather find a solution that performs a clean removal of the duplicate cover.

This plugin is not for people who have small libraries, it is intended for users who have large, or very large collections. It simply is not an option to use the book editor to clean up that image and the associated reference. I know that for my collection, if I have to choose between duplicate covers, or some stray code, I will choose the removal of duplicate covers.

I do know that converting the book can fix the OPF file, but converting has potential issues also. I've looked at the capabilities of Polish, and the available plugins and I don't see a solution from that side. So, I am hoping that someone can provide suggestions for achieving a clean removal.
Trester99 is offline   Reply With Quote
Old 12-22-2025, 06:23 PM   #2
BetterRed
null operator (he/him)
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 22,092
Karma: 30277960
Join Date: Mar 2012
Location: Sydney Australia
Device: none
Did you look at the code for the "Remove missing file entries from manifest" option in the Modify EPUB plugin?

BR
BetterRed is offline   Reply With Quote
Advert
Old 12-22-2025, 06:38 PM   #3
Trester99
Member
Trester99 began at the beginning.
 
Posts: 23
Karma: 10
Join Date: Jun 2024
Device: Kindle Paperwhite
Quote:
Originally Posted by BetterRed View Post
Did you look at the code for the "Remove missing file entries from manifest" option in the Modify EPUB plugin?

BR
I did try that but it doesn't remove the line referencing the duplicate image.
Trester99 is offline   Reply With Quote
Old 12-22-2025, 06:50 PM   #4
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 81,315
Karma: 150263711
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
Kindle eBooks in KF8 do have a duplicate cover. One is the cover and the other is a smaller version which is mot likely the thumbnail. That's intentional. I've seen very few ePub with a duplicate cover.
JSWolf is offline   Reply With Quote
Old 12-22-2025, 07:13 PM   #5
BetterRed
null operator (he/him)
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 22,092
Karma: 30277960
Join Date: Mar 2012
Location: Sydney Australia
Device: none
Quote:
Originally Posted by Trester99 View Post
I did try that but it doesn't remove the line referencing the duplicate image.
FTR: I have not looked at the Modify EPUB code.

My thinking was that you could 'copy/paste/adjust" the code from Modify EPUB that implements the Remove missing file entries from manifest option, such that it gets executed after the AI generated code removes the duplicate image.

BR
BetterRed is offline   Reply With Quote
Advert
Old 12-22-2025, 07:48 PM   #6
Trester99
Member
Trester99 began at the beginning.
 
Posts: 23
Karma: 10
Join Date: Jun 2024
Device: Kindle Paperwhite
Quote:
Originally Posted by JSWolf View Post
Kindle eBooks in KF8 do have a duplicate cover. One is the cover and the other is a smaller version which is mot likely the thumbnail. That's intentional. I've seen very few ePub with a duplicate cover.
If we are talking about retail epub's, I 100% agree with you! But most duplicates covers are caused by books going thru a conversion. I have no idea what program is causing it, but that program creates a "titlepage.html" file that they include first in the spine, and it creates a "Cover.html" which is second in the spine. Included in the both html files is the "same" image file "cover.jpeg", of course resulting in a duplicate cover. In the program the goal was to get as many duplicate covers into a category that the user could auto-delete. We call that category "Identical" and it represents 80% of the total duplicates. What is really fascinating is that every dup image is visually identical to the cover, though the analytics that define the characteristics of the image can vary somewhat. The only explanation is the conversion process.
Trester99 is offline   Reply With Quote
Old 12-22-2025, 08:02 PM   #7
Trester99
Member
Trester99 began at the beginning.
 
Posts: 23
Karma: 10
Join Date: Jun 2024
Device: Kindle Paperwhite
Quote:
Originally Posted by BetterRed View Post
FTR: I have not looked at the Modify EPUB code.

My thinking was that you could 'copy/paste/adjust" the code from Modify EPUB that implements the Remove missing file entries from manifest option, such that it gets executed after the AI generated code removes the duplicate image.

BR
Your question made me take another look at AI's process. Turns out that they stopped deleting the image file, and are in fact removing the entry from the source html file. I tried using the Modify Epub "Remove unused image files" but that didn't work, I'm assuming because it is still listed in the manifest? But if I had AI go back and delete the image file again that may be the start of the solution. And you are absolutely right that if I could get permission from Kiwidude to use that part of his code, the problem is solved. Hey Kiwi, I've donated twice to your plugins! Thanks BR!
Trester99 is offline   Reply With Quote
Old 12-22-2025, 10:56 PM   #8
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,724
Karma: 28549306
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Not sure what AI you were using but after getting the file name, this is literally 5 lines of code.

Code:
from calibre.ebooks.oeb.polish import get_container
from calibre.ebooks.oeb.polish.replace import remove_links_to
c = get_container(path, tweak_mode=True)
def predicate(name, href, fragment=None): return name == fname_to_remove
remove_links_to(c, 
c.remove_item(name_to_remove)
c.commit(path_to_save_to)
kovidgoyal is online now   Reply With Quote
Old 12-23-2025, 06:37 AM   #9
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 81,315
Karma: 150263711
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
Quote:
Originally Posted by Trester99 View Post
If we are talking about retail epub's, I 100% agree with you! But most duplicates covers are caused by books going thru a conversion. I have no idea what program is causing it, but that program creates a "titlepage.html" file that they include first in the spine, and it creates a "Cover.html" which is second in the spine. Included in the both html files is the "same" image file "cover.jpeg", of course resulting in a duplicate cover. In the program the goal was to get as many duplicate covers into a category that the user could auto-delete. We call that category "Identical" and it represents 80% of the total duplicates. What is really fascinating is that every dup image is visually identical to the cover, though the analytics that define the characteristics of the image can vary somewhat. The only explanation is the conversion process.
Is the cover image being duplicated or is there two sets of code to display the image?
JSWolf is offline   Reply With Quote
Old 12-23-2025, 10:47 AM   #10
jackie_w
Grand Sorcerer
jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.
 
Posts: 6,283
Karma: 16800000
Join Date: Sep 2009
Location: UK
Device: ClaraHD, Forma, Libra2, Clara2E, LibraCol, PBTouchHD3
Quote:
Originally Posted by kovidgoyal View Post
Not sure what AI you were using but after getting the file name, this is literally 5 lines of code.

Code:
from calibre.ebooks.oeb.polish import get_container
from calibre.ebooks.oeb.polish.replace import remove_links_to
c = get_container(path, tweak_mode=True)
def predicate(name, href, fragment=None): return name == fname_to_remove
remove_links_to(c, 
c.remove_item(name_to_remove)
c.commit(path_to_save_to)
@Kovid, if you're still reading this thread...

I've been using the calibre get_container function for many years and I'm slightly concerned by this tweak_mode=True parameter. I'm pretty sure I've never used it (at least not knowingly). Yet I have, over the years, written various plugins (User and Editor) which appeared to work OK.

What have I missed? Can you explain, in simple terms, when it should be used and when it's OK to leave it as default tweak_mode=False. Many thanks.
jackie_w is offline   Reply With Quote
Old 12-23-2025, 01:12 PM   #11
Trester99
Member
Trester99 began at the beginning.
 
Posts: 23
Karma: 10
Join Date: Jun 2024
Device: Kindle Paperwhite
Quote:
Originally Posted by kovidgoyal View Post
Not sure what AI you were using but after getting the file name, this is literally 5 lines of code.

Code:
from calibre.ebooks.oeb.polish import get_container
from calibre.ebooks.oeb.polish.replace import remove_links_to
c = get_container(path, tweak_mode=True)
def predicate(name, href, fragment=None): return name == fname_to_remove
remove_links_to(c, 
c.remove_item(name_to_remove)
c.commit(path_to_save_to)
Thanks Kovid!

Up until recently, I had been using Gemini Pro and ChatGPT Pro interchangeably. Since Gemini upgraded to 3.0, I have switched to Gemini. Unfortunately, neither Gemini or ChatGPT is ever successful converting a script file to a plugin. For all of my previous projects I have relied on script files ran thru the Calibre Debug program, which works fine. For this program, I tried Grok for the first time and they were able to create a Plugin in about 10 minutes.
Trester99 is offline   Reply With Quote
Old 12-23-2025, 01:26 PM   #12
Trester99
Member
Trester99 began at the beginning.
 
Posts: 23
Karma: 10
Join Date: Jun 2024
Device: Kindle Paperwhite
Quote:
Originally Posted by JSWolf View Post
Is the cover image being duplicated or is there two sets of code to display the image?
It is two set of code in the example above. But in the other duplicates, we do commonly see that "identical" image files are being created. If I remember correctly, AI was using 6 different apps that assessed different characteristics of the images and based on those analytics, you could tell that the 2nd image originated from the cover file.
Trester99 is offline   Reply With Quote
Old 12-23-2025, 01:28 PM   #13
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,724
Karma: 28549306
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
@jackie_w: tweak_mode is what the edit book tool uses. Polish Books does not. tweak_mode is generally less forgiving of bad input and does less auto correction of syntx errors andthe like when parsing things. Whether you use it or not depends on whether you want autocorrection or not.
kovidgoyal is online now   Reply With Quote
Old 12-23-2025, 01:40 PM   #14
jackie_w
Grand Sorcerer
jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.
 
Posts: 6,283
Karma: 16800000
Join Date: Sep 2009
Location: UK
Device: ClaraHD, Forma, Libra2, Clara2E, LibraCol, PBTouchHD3
Quote:
Originally Posted by kovidgoyal View Post
@jackie_w: tweak_mode is what the edit book tool uses. Polish Books does not. tweak_mode is generally less forgiving of bad input and does less auto correction of syntx errors andthe like when parsing things. Whether you use it or not depends on whether you want autocorrection or not.
Thanks! That's a relief! I don't use my personal plugins to do bulk updates on any epubs which haven't already passed both calibre CheckBook and EpubCheck, so I think I'm OK.

Merry Christmas
jackie_w is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Removing Cover Image From MOBI File AuthorGreg Kindle Formats 0 11-02-2013 05:39 PM
TWO OPF files or something like Media Queries in the OPF file for KF8 and MOBI? DHahn Kindle Formats 3 04-17-2012 05:06 AM
Need Help removing image file Cpl Punishment Nook Developer's Corner 1 10-08-2011 07:50 AM
Removing class and id references Artha Sigil 10 07-24-2011 12:17 PM
Removing files, cleaning up references Artha Sigil 11 07-19-2011 12:05 PM


All times are GMT -4. The time now is 12:52 PM.


MobileRead.com is a privately owned, operated and funded community.