View Single Post
Old 09-22-2025, 05:27 AM   #30
juanferna
Member
juanferna began at the beginning.
 
Posts: 16
Karma: 10
Join Date: May 2025
Device: sony PRS T3
Let me a more detailed explanation.

Quote:
Originally Posted by kovidgoyal View Post
I dont follow, nothing is modifying files once extraction is finished ..
I'm referring to the modification made with the src/calibre/srv/render_book.py->f.write(shtml) instruction.
With this instruction, each HTML file is completely rewritten as a JSON-structured file. Typically, the size of the rewritten file is larger than the previous one.
In addition, before the f.write(shtml) instruction, the file is opened (with container.open(name, 'wb') as f:). With this previous instruction, the file size is set to zero.
Therefore, each file successively goes through three sizes within the cache:
- the initial size of the epub
- zero
- the final size with JSON structure

Several threads are used to process the files in an epub (src/calibre/srv/render_book.py->num_workers).
Let's say the article_1 file has a link to the article_2 file. It may happen that we have one thread processing the file article_1 (thread 1) and another processing article_2 (thread 2).
When thread 1 queries for the existence of article_2 (has_name_and_is_not_empty), it will get one of the three sizes mentioned above, depending on how far along thread 2 is in its processing.
The problem arises when it gets a size of zero.
In this case, the workaround performs a second read of the size (sz = f.seek(0, os.SEEK_END)). I have found that this second read returns the final size on some occasions, but on other occasions, it still returns zero. This is when the workaround is not enough, and the viewer ends up displaying an (erroneous) message indicating that a file is missing.
juanferna is offline   Reply With Quote