View Single Post
Old 01-12-2023, 03:35 PM   #20
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 9,424
Karma: 6733960
Join Date: Nov 2009
Device: many
The more I think about this, we need to screen out tiny/inline images used in place of symbols, or for drop caps, or for scene breaks, or ...

Here is standardebooks view on LOI:

Quote:
The LoI only contains links to images that are major structural components of the work.

An illustration is a major structural component if, for example: it is an illustration of events in the book, like a full-page drawing or end-of-chapter decoration; it is essential to the plot, like a diagram of a murder scene or a map; or it is a component of the text, like photographs in a documentary narrative.

An illustration is not a major structural components if, for example: it is a drawing used to represent a person’s signature, like an X mark; it is an inline drawing representing text in alien languages; it is a drawing used as a layout element to illustrate forms, tables, or diagrams.
So I think based on this we only include img tags who are direct children of figure or div tags
(and *not* children of inline tags or tags like p, h1-h6, span, etc.)

Once I get that working, we can expand the search to include svg tags with image tags as children (or even all svg tags).

The list of files to be processed will be limited to those in the spine and the loi will be built up in spine order from top to bottom of each file while skipping the cover and NAV files themselves.

I think sigil_bs4 would be best so we can use css selectors to identify the nodes, parents, and to find all values of id attributes already used in that file to prevent duplicates.

When I get something working roughly, I will post it here for people to test with.
KevinH is offline