View Single Post
Old 04-10-2021, 12:03 PM   #45
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 8,478
Karma: 5703586
Join Date: Nov 2009
Device: many
Okay, I have been working on how best to handle the potential duplication of ids across the files to be merged into one.

For this particular test case I see the following output:
Code:
Id duplicated:  "fn_14"  in  ("OEBPS/Text/Chapter87.html", "OEBPS/Text/Chapter89.html")

Id duplicated:  "fn_16"  in  ("OEBPS/Text/Chapter87.html", "OEBPS/Text/Chapter89.html")

Id duplicated:  "fnb"  in  ("OEBPS/Text/Chapter61.html", "OEBPS/Text/Chapter85.html", "OEBPS/Text/Chapter89.html", "OEBPS/Text/Chapter91.html", "OEBPS/Text/Chapter94.html", "OEBPS/Text/Chapter95.html", "OEBPS/Text/Chapter96.html")

Id duplicated:  "fn4"  in  ("OEBPS/Text/Chapter53.html", "OEBPS/Text/Chapter79.html", "OEBPS/Text/Chapter85.html", "OEBPS/Text/Chapter87.html", "OEBPS/Text/Chapter89.html", "OEBPS/Text/Chapter94.html", "OEBPS/Text/Chapter95.html")

Id duplicated:  "fn14"  in  ("OEBPS/Text/Chapter87.html", "OEBPS/Text/Chapter89.html")

Id duplicated:  "fn_3"  in  ("OEBPS/Text/Chapter53.html", "OEBPS/Text/Chapter79.html", "OEBPS/Text/Chapter85.html", "OEBPS/Text/Chapter87.html", "OEBPS/Text/Chapter89.html", "OEBPS/Text/Chapter94.html", "OEBPS/Text/Chapter95.html")

Id duplicated:  "fn_10"  in  ("OEBPS/Text/Chapter79.html", "OEBPS/Text/Chapter87.html", "OEBPS/Text/Chapter89.html", "OEBPS/Text/Chapter94.html")

Id duplicated:  "fn_7"  in  ("OEBPS/Text/Chapter53.html", "OEBPS/Text/Chapter79.html", "OEBPS/Text/Chapter85.html", "OEBPS/Text/Chapter87.html", "OEBPS/Text/Chapter89.html", "OEBPS/Text/Chapter94.html", "OEBPS/Text/Chapter95.html")

Id duplicated:  "fn_20"  in  ("OEBPS/Text/Chapter87.html", "OEBPS/Text/Chapter89.html")

Id duplicated:  "fn10"  in  ("OEBPS/Text/Chapter79.html", "OEBPS/Text/Chapter87.html", "OEBPS/Text/Chapter89.html", "OEBPS/Text/Chapter94.html")

Id duplicated:  "fnc"  in  ("OEBPS/Text/Chapter67.html", "OEBPS/Text/Chapter89.html", "OEBPS/Text/Chapter94.html")

Id duplicated:  "fna"  in  ("OEBPS/Text/Chapter61.html", "OEBPS/Text/Chapter71.html", "OEBPS/Text/Chapter85.html", "OEBPS/Text/Chapter88.html", "OEBPS/Text/Chapter89.html", "OEBPS/Text/Chapter91.html", "OEBPS/Text/Chapter93.html", "OEBPS/Text/Chapter94.html", "OEBPS/Text/Chapter95.html", "OEBPS/Text/Chapter96.html")

Id duplicated:  "fn15"  in  ("OEBPS/Text/Chapter87.html", "OEBPS/Text/Chapter89.html")

Id duplicated:  "fn_1"  in  ("OEBPS/Text/Chapter53.html", "OEBPS/Text/Chapter79.html", "OEBPS/Text/Chapter87.html", "OEBPS/Text/Chapter89.html", "OEBPS/Text/Chapter90.html", "OEBPS/Text/Chapter94.html", "OEBPS/Text/Chapter95.html")

Id duplicated:  "fn5"  in  ("OEBPS/Text/Chapter53.html", "OEBPS/Text/Chapter79.html", "OEBPS/Text/Chapter85.html", "OEBPS/Text/Chapter87.html", "OEBPS/Text/Chapter89.html", "OEBPS/Text/Chapter94.html", "OEBPS/Text/Chapter95.html")

Id duplicated:  "fn_9"  in  ("OEBPS/Text/Chapter79.html", "OEBPS/Text/Chapter87.html", "OEBPS/Text/Chapter89.html", "OEBPS/Text/Chapter94.html", "OEBPS/Text/Chapter95.html")

Id duplicated:  "fn_21"  in  ("OEBPS/Text/Chapter87.html", "OEBPS/Text/Chapter89.html")

Id duplicated:  "fnd"  in  ("OEBPS/Text/Chapter67.html", "OEBPS/Text/Chapter89.html")

Id duplicated:  "fn2"  in  ("OEBPS/Text/Chapter53.html", "OEBPS/Text/Chapter79.html", "OEBPS/Text/Chapter85.html", "OEBPS/Text/Chapter87.html", "OEBPS/Text/Chapter89.html", "OEBPS/Text/Chapter90.html", "OEBPS/Text/Chapter94.html", "OEBPS/Text/Chapter95.html")

Id duplicated:  "fn8"  in  ("OEBPS/Text/Chapter79.html", "OEBPS/Text/Chapter87.html", "OEBPS/Text/Chapter89.html", "OEBPS/Text/Chapter94.html", "OEBPS/Text/Chapter95.html")

Id duplicated:  "fn_13"  in  ("OEBPS/Text/Chapter87.html", "OEBPS/Text/Chapter89.html")

Id duplicated:  "fn20"  in  ("OEBPS/Text/Chapter87.html", "OEBPS/Text/Chapter89.html")

Id duplicated:  "fn_b"  in  ("OEBPS/Text/Chapter61.html", "OEBPS/Text/Chapter85.html", "OEBPS/Text/Chapter89.html", "OEBPS/Text/Chapter91.html", "OEBPS/Text/Chapter94.html", "OEBPS/Text/Chapter95.html", "OEBPS/Text/Chapter96.html")

Id duplicated:  "fn21"  in  ("OEBPS/Text/Chapter87.html", "OEBPS/Text/Chapter89.html")

Id duplicated:  "fn22"  in  ("OEBPS/Text/Chapter87.html", "OEBPS/Text/Chapter89.html")

Id duplicated:  "fn18"  in  ("OEBPS/Text/Chapter87.html", "OEBPS/Text/Chapter89.html")

Id duplicated:  "fn3"  in  ("OEBPS/Text/Chapter53.html", "OEBPS/Text/Chapter79.html", "OEBPS/Text/Chapter85.html", "OEBPS/Text/Chapter87.html", "OEBPS/Text/Chapter89.html", "OEBPS/Text/Chapter94.html", "OEBPS/Text/Chapter95.html")

Id duplicated:  "fn_11"  in  ("OEBPS/Text/Chapter79.html", "OEBPS/Text/Chapter87.html", "OEBPS/Text/Chapter89.html", "OEBPS/Text/Chapter94.html")

Id duplicated:  "fn9"  in  ("OEBPS/Text/Chapter79.html", "OEBPS/Text/Chapter87.html", "OEBPS/Text/Chapter89.html", "OEBPS/Text/Chapter94.html", "OEBPS/Text/Chapter95.html")

Id duplicated:  "fn_19"  in  ("OEBPS/Text/Chapter87.html", "OEBPS/Text/Chapter89.html")

Id duplicated:  "fn6"  in  ("OEBPS/Text/Chapter53.html", "OEBPS/Text/Chapter79.html", "OEBPS/Text/Chapter85.html", "OEBPS/Text/Chapter87.html", "OEBPS/Text/Chapter89.html", "OEBPS/Text/Chapter94.html", "OEBPS/Text/Chapter95.html")

Id duplicated:  "fn_4"  in  ("OEBPS/Text/Chapter53.html", "OEBPS/Text/Chapter79.html", "OEBPS/Text/Chapter85.html", "OEBPS/Text/Chapter87.html", "OEBPS/Text/Chapter89.html", "OEBPS/Text/Chapter94.html", "OEBPS/Text/Chapter95.html")

Id duplicated:  "fn16"  in  ("OEBPS/Text/Chapter87.html", "OEBPS/Text/Chapter89.html")

Id duplicated:  "fn_d"  in  ("OEBPS/Text/Chapter67.html", "OEBPS/Text/Chapter89.html")

Id duplicated:  "fn_6"  in  ("OEBPS/Text/Chapter53.html", "OEBPS/Text/Chapter79.html", "OEBPS/Text/Chapter85.html", "OEBPS/Text/Chapter87.html", "OEBPS/Text/Chapter89.html", "OEBPS/Text/Chapter94.html", "OEBPS/Text/Chapter95.html")

Id duplicated:  "fn_5"  in  ("OEBPS/Text/Chapter53.html", "OEBPS/Text/Chapter79.html", "OEBPS/Text/Chapter85.html", "OEBPS/Text/Chapter87.html", "OEBPS/Text/Chapter89.html", "OEBPS/Text/Chapter94.html", "OEBPS/Text/Chapter95.html")

Id duplicated:  "fn_2"  in  ("OEBPS/Text/Chapter53.html", "OEBPS/Text/Chapter79.html", "OEBPS/Text/Chapter85.html", "OEBPS/Text/Chapter87.html", "OEBPS/Text/Chapter89.html", "OEBPS/Text/Chapter90.html", "OEBPS/Text/Chapter94.html", "OEBPS/Text/Chapter95.html")

Id duplicated:  "fn_22"  in  ("OEBPS/Text/Chapter87.html", "OEBPS/Text/Chapter89.html")

Id duplicated:  "fn12"  in  ("OEBPS/Text/Chapter87.html", "OEBPS/Text/Chapter89.html")

Id duplicated:  "fn19"  in  ("OEBPS/Text/Chapter87.html", "OEBPS/Text/Chapter89.html")

Id duplicated:  "fn_8"  in  ("OEBPS/Text/Chapter79.html", "OEBPS/Text/Chapter87.html", "OEBPS/Text/Chapter89.html", "OEBPS/Text/Chapter94.html", "OEBPS/Text/Chapter95.html")

Id duplicated:  "fn_17"  in  ("OEBPS/Text/Chapter87.html", "OEBPS/Text/Chapter89.html")

Id duplicated:  "fn_12"  in  ("OEBPS/Text/Chapter87.html", "OEBPS/Text/Chapter89.html")

Id duplicated:  "fn_c"  in  ("OEBPS/Text/Chapter67.html", "OEBPS/Text/Chapter89.html", "OEBPS/Text/Chapter94.html")

Id duplicated:  "fn_15"  in  ("OEBPS/Text/Chapter87.html", "OEBPS/Text/Chapter89.html")

Id duplicated:  "fn1"  in  ("OEBPS/Text/Chapter53.html", "OEBPS/Text/Chapter79.html", "OEBPS/Text/Chapter87.html", "OEBPS/Text/Chapter89.html", "OEBPS/Text/Chapter90.html", "OEBPS/Text/Chapter94.html", "OEBPS/Text/Chapter95.html")

Id duplicated:  "fn_18"  in  ("OEBPS/Text/Chapter87.html", "OEBPS/Text/Chapter89.html")

Id duplicated:  "fn_a"  in  ("OEBPS/Text/Chapter61.html", "OEBPS/Text/Chapter71.html", "OEBPS/Text/Chapter85.html", "OEBPS/Text/Chapter88.html", "OEBPS/Text/Chapter89.html", "OEBPS/Text/Chapter91.html", "OEBPS/Text/Chapter93.html", "OEBPS/Text/Chapter94.html", "OEBPS/Text/Chapter95.html", "OEBPS/Text/Chapter96.html")

Id duplicated:  "fn7"  in  ("OEBPS/Text/Chapter53.html", "OEBPS/Text/Chapter79.html", "OEBPS/Text/Chapter85.html", "OEBPS/Text/Chapter87.html", "OEBPS/Text/Chapter89.html", "OEBPS/Text/Chapter94.html", "OEBPS/Text/Chapter95.html")

Id duplicated:  "fn17"  in  ("OEBPS/Text/Chapter87.html", "OEBPS/Text/Chapter89.html")

Id duplicated:  "fn11"  in  ("OEBPS/Text/Chapter79.html", "OEBPS/Text/Chapter87.html", "OEBPS/Text/Chapter89.html", "OEBPS/Text/Chapter94.html")

Id duplicated:  "fn13"  in  ("OEBPS/Text/Chapter87.html", "OEBPS/Text/Chapter89.html")
This is a huge list and given the name, it seems that many of these duplicates will be the target of hrefs and so will fail miserably.

There are too many for people to to handle manually.

So I think the only way to deal with this is to update the duplicate ids with unique ones and then walk the entire set of html files to update the links that may have pointed to them to use the new fragment.

So it appears we will have to use an approach much like calibre does and automate the renaming to be unique at least among the set of files to be merged. For that we will have to add a SourceUpdater for Fragments to our codebase.

Thoughts?

KevinH
KevinH is offline   Reply With Quote