In case anyone wants to try their hand at a python3lib script or standalone python script to update just the opf redoing *all* manifest ids to rely on filenames, to form the basis of a new Sigil tool call it "Fixup Manifest IDs" here is where I am as a basic approach:
Approach:
1. Pass 1 - parse the OPF building up a dict of all_ids
- while parsing store away and ncx id and cover id
2. create an empty changed_id dict
3. Pass 2 - parse the opf and walk the manifest
- create a potential new id based on current file name
- if different from current id
- use the dict of all_ids to make sure the new_id it valid and unique
- create the new manifest entry, replacing the old id with the new one
- update all_ids dict to remove the old id and add in the new id
- update dict changed_id[old_id] = new_id
4. Still in Pass 2 - when you reach the spine if ncxid in change_id, parse and update the spine toc attribute
5. Still in Pass 2 - walk the spine entries updating idrefs from change_id
6. Still in Pass 2 - when you reach bindings (it it exists) walk the mediatypes entries in bindings - updating the "handler" attribute as needed
7. save the changed opf.
8. Pass 3 - parse the new opf if cover_id, walk the metadata updating any cover id meta as needed
9. Still in Pass 3 - walk the manifest entries updating the "media-overlay" and "fallback" attribute values from change_id
10. Save the final version of the OPF as it stands
Of course, if you parse to store more state in a changeable format then you can reduce the total number of passes at the expense of more data structures.
And FWIW - I think using the quickparser.py tool on the opf will do what is needed here in 3 passes without any additional state needed at all.
Last edited by KevinH; 06-22-2024 at 01:40 PM.
|