Quote:
Originally Posted by chaley
I looked at the crash reports. In both cases it crashed in libxml2 with a SEGV. That raises the question: are lxml and libxml2 thread safe? This document says libxml2 is but with caveats. I don't know what versions are being used nor can I determine if xmlInitParser() is called as directed.
I also noticed that both crashes happened when serializing the tree for output. Some notes: - This happens when container.flush_cache() is called. flush_cache() is called from within a thread pool worker. That means that one thread pool is creating a second thread pool. I can easily imagine that this isn't a good thing.
- Because multiple threads can call flush_cache(), more than one thread can call commit_item() on the same tree. Does this write the file twice?
- Because multiple threads can call flush_cache on the same dirtied set, trees can be committed at the same time. I don't know what happens if the two threads attempt to write the same file.
Two suggestions: - Change flush_cache() to do the work directly without creating threads. This avoids nesting thread pools.
- Put generating the set of dirtied trees and calling commit_item() into a locked with. This avoids possibly committing the same item more than once at the same time.
Something like this:
Code:
# This goes in __init__
super(KEPubContainer, self).__init__(epub_path, log, *args, **kwargs)
self.flush_lock = threading.Lock() # New code
self.log = log
self.log.debug(f"Creating KePub Container for ePub at {epub_path}")
# This replaces flush_cache()
def flush_cache(self) -> None:
with self.flush_lock.acquire():
for name in list(self.dirtied):
self.commit_item(name, keep_parsed=True)
# def __flush_cache_impl(self, name: str) -> None:
# """Flush the cache, writing all cached values to disk."""
# self.commit_item(name, keep_parsed=True)
EDIT: Looking further in the code in calibre, it seems to me that committing all the dirty items while threads are running is very dangerous. The calibre ebook container frequently sets an item as dirty before doing its work. Some other thread could then attempt to commit changes that are concurrently being made.
Personally, I would choose either - to remove the threading, or
- ensure that:
- no threaded item calls flush_cache()
- the thread controller (__run_async()) runs all the threads, then after the "with" exits calls flush_cache().
The goal is to guarantee that no thread can affect the processing of another thread via the dirty settings.
My first choice would be to remove the threading to see if the problems go away. If they do then look at how threading can be put back safely. If they don't then back to the drawing board.
[ EDIT 2]@davidfor: see the PM I sent you.
|
Firstly, I'm not ignoring anyone, between work and being away, I haven't had the time or energy to look at this. Or the question of how to use templates for collections.
Having made my excuses...
I do agree with your analysis. The way the flush_cache was implemented and the commits were called did look to be prone to timing issues with the threading. I have attached a version of the driver with the your updated "container.py" in it. It is working with no issues here under Windows. Hopefully it solve the problems with Mac.