Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Sigil > Plugins

Notices

Reply
 
Thread Tools Search this Thread
Old 08-08-2023, 09:04 AM   #61
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 28,848
Karma: 207000000
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Quote:
Originally Posted by Vanguard3000 View Post
Hi, thanks for creating this plugin - it really helps me clean up retail epubs that use a lot of unnecessary ids. However, I've noticed it isn't able to clean out all unused ids. For example:

Code:
<h1 id="bm4">TITLE</h1>

<h2 id="bm4-s01">SECTION</h2>
In this case, "bm4" is unused, but "bm4-s01" is. If I rename "bm4" to, say, "banana" your plugin will catch it, so it seems to be thinking that, because the "bm4" string is present in the used "bm4-s01" it thinks that they're both in use.

Anyway, I know this plugin is a bit on the old side but is there a chance this could be addressed? Alternatively, is there another similar plugin that will find these ids? Thanks in advance.
Are you absolutely certain that "bm4" is unused? It's not used in the ncx file, or anything? I ask because I'm not seeing anything in the plugin developer's code that would suggest "bm4" would get a pass simply because it matches a portion of the string of another used id.

I'm attaching a simple test epub that demonstrates that both the "bm4" and "bm4-s01" ids get properly removed by the plugin when they are both truly unused

Last edited by DiapDealer; 08-08-2023 at 09:07 AM.
DiapDealer is offline   Reply With Quote
Old 08-08-2023, 09:06 AM   #62
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 28,848
Karma: 207000000
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Whoops! May have spoken too soon. I can reproduce your results.

Last edited by DiapDealer; 08-08-2023 at 09:33 AM.
DiapDealer is offline   Reply With Quote
Advert
Old 08-08-2023, 09:33 AM   #63
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 28,848
Karma: 207000000
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Simple fix. There's no reason to join the python list of potential ids into a string before checking to see if an id is IN said list. In fact doing so causes the problem. bm4 will ALWAYS be IN a string that contains bm4_s01. The IN comparator will work on a python list without concatenating the list's elements into a string first. And will treat all the lists' elements as individual.

I'm attaching a test plugin with an updated cutils.py file (lines 146-149). The plugin dev can do with it what they will.
Attached Files
File Type: zip RemoveUnusedBookmarks_v018a.zip (21.9 KB, 745 views)

Last edited by DiapDealer; 08-08-2023 at 09:32 PM.
DiapDealer is offline   Reply With Quote
Old 08-08-2023, 12:13 PM   #64
Vanguard3000
Groupie
Vanguard3000 ought to be getting tired of karma fortunes by now.Vanguard3000 ought to be getting tired of karma fortunes by now.Vanguard3000 ought to be getting tired of karma fortunes by now.Vanguard3000 ought to be getting tired of karma fortunes by now.Vanguard3000 ought to be getting tired of karma fortunes by now.Vanguard3000 ought to be getting tired of karma fortunes by now.Vanguard3000 ought to be getting tired of karma fortunes by now.Vanguard3000 ought to be getting tired of karma fortunes by now.Vanguard3000 ought to be getting tired of karma fortunes by now.Vanguard3000 ought to be getting tired of karma fortunes by now.Vanguard3000 ought to be getting tired of karma fortunes by now.
 
Posts: 169
Karma: 474196
Join Date: Jan 2011
Location: Canada
Device: Kobo Libra 2
Hi, your guerilla update works well - I applied it to my previously-cleaned epub and it removed an additional three unused IDs, without removing used ones (i.e. "bm4-s01" remained).

Thanks for your help, DiapDealer, and thanks again for the great plugin, Slowsmile!
Vanguard3000 is offline   Reply With Quote
Old 08-08-2023, 08:09 PM   #65
democrite
Evangelist
democrite will give the Devil his due.democrite will give the Devil his due.democrite will give the Devil his due.democrite will give the Devil his due.democrite will give the Devil his due.democrite will give the Devil his due.democrite will give the Devil his due.democrite will give the Devil his due.democrite will give the Devil his due.democrite will give the Devil his due.democrite will give the Devil his due.
 
Posts: 441
Karma: 77256
Join Date: Sep 2011
Device: none
i mentioned an issue a while back that as far as i know is not resolved. if an id for example “bm4” is listed in multiple files or every file yet is used once or only in a few, the unused instances are not removed. in my case, some publishers use some sequential numbering for paragraphs such that after using the plugin, an epub can have thousands of remaining unused ids.
democrite is offline   Reply With Quote
Advert
Old 08-09-2023, 01:08 AM   #66
DNSB
Bibliophagist
DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.
 
DNSB's Avatar
 
Posts: 47,940
Karma: 174315098
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Libra Colour, Lenovo M8 FHD, Paperwhite 4, Tolino epos
Quote:
Originally Posted by democrite View Post
i mentioned an issue a while back that as far as i know is not resolved. if an id for example “bm4” is listed in multiple files or every file yet is used once or only in a few, the unused instances are not removed. in my case, some publishers use some sequential numbering for paragraphs such that after using the plugin, an epub can have thousands of remaining unused ids.
I haven't seen that all that many times other than with Kobo's kepub spans and a bit of regex search/replace has taken care of that issue since the IDs tend to have the same structure.
DNSB is offline   Reply With Quote
Old 08-09-2023, 07:39 PM   #67
democrite
Evangelist
democrite will give the Devil his due.democrite will give the Devil his due.democrite will give the Devil his due.democrite will give the Devil his due.democrite will give the Devil his due.democrite will give the Devil his due.democrite will give the Devil his due.democrite will give the Devil his due.democrite will give the Devil his due.democrite will give the Devil his due.democrite will give the Devil his due.
 
Posts: 441
Karma: 77256
Join Date: Sep 2011
Device: none
elsevier epubs do that. penguin epubs also often add ids to all paragraphs through i haven’t checked those to see if the numbering starts from some same beginning in those.

yes i can fix them with sed, adding the file name as prefix to all ids and then remove, but something easier someday would be nice.
democrite is offline   Reply With Quote
Old 08-10-2023, 10:21 AM   #68
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 9,069
Karma: 6361556
Join Date: Nov 2009
Device: many
FWIW ...

In epub2 the OPF guide section can include ids (fragments in the url) which point into xhtml files that should not be removed.

And under epub2 the adobe pagemap.xml can and typically do point to ids in xhtml files that will break if removed.

And, technically the same id can be re-used as long as they are in different xhtml files, so determining if used or not should really keep track of filenames.

And, technically under epub3, EPUB Canonical Fragment Identifiers (cfis) can use ids to point to specific spots in xhtml files for either internal or external cfi links, bookmarks, annotation points that may exist outside the epub itself (from cloud based web cfi links).

And technically, under epub3 that supports javascript, those ids could be referenced for dynamic searching or popup footnotes or by the js code itself.

So you really need you take care of all of the potential use points or you can never know if an id is used or not.

Therefore, without a really good reason, removing ids is probably not the best idea ... unless you truly know or control the epubs full production.

Even numbered paragraph ids are useful for reflowable epub locations used in printed academic citations and are more correct than page numbers in many cases.

The overhead of parsing even a thousand ids in a single xhtml file is minuscule compared to the time takes the parser to parse and create the initial DOM tree itself. So removing them is rarely or ever necessary from a performance perspective.


My 2 cents ...

Last edited by KevinH; 08-10-2023 at 11:50 AM.
KevinH is offline   Reply With Quote
Old 08-10-2023, 03:43 PM   #69
democrite
Evangelist
democrite will give the Devil his due.democrite will give the Devil his due.democrite will give the Devil his due.democrite will give the Devil his due.democrite will give the Devil his due.democrite will give the Devil his due.democrite will give the Devil his due.democrite will give the Devil his due.democrite will give the Devil his due.democrite will give the Devil his due.democrite will give the Devil his due.
 
Posts: 441
Karma: 77256
Join Date: Sep 2011
Device: none
Such Elsevier epubs can perhaps have 15000+ ids. yes i stopped using the plugin. all headers up to h6 can have ids and I generally exclude anything past a 3rd level header for each chapter from a remade toc. as i may want to readd those in the future i no longer use the plugin.

an option to not remove ids from headers would be nice but maybe that is not often used by others.

other times i may try to add bibliographic links to academic titles, since i may want to check references, by adding regex of last name to each paragraph and then links to such. not exactly accurate but good enough yet in such cases there ends up being duplicate ids. a different issue i’ll need to figure out an easy way to remove such. maybe applescript with bbedit.
democrite is offline   Reply With Quote
Old 08-10-2023, 05:54 PM   #70
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 80,650
Karma: 150249619
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
Can someone please create a version of this plugin for calibre? Thanks.
JSWolf is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Removing plugin in Calibre removes it from all instances of Calibre oblox Calibre 9 09-09-2016 05:39 AM
iPad Possibility to sync bookmarks through side loaded ePubs (Any iOS software?) andsoitgoes Apple Devices 12 04-13-2012 07:38 PM
Modify bookmarks in epubs silentguy Development 3 08-03-2011 05:37 PM
Sideloaded ePubs, chapters and bookmarks Steven Lyle Jordan Nook Color & Nook Tablet 10 02-05-2011 06:35 PM
Problem (bug) with bookmarks in PDF plugin (2.0 RC2) luite iRex 1 07-12-2010 02:36 AM


All times are GMT -4. The time now is 11:15 PM.


MobileRead.com is a privately owned, operated and funded community.