View Single Post
Old 04-20-2022, 05:28 AM   #2724
chaley
Grand Sorcerer
chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.
 
Posts: 12,509
Karma: 8065348
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
Quote:
Originally Posted by fogice View Post
I'm testing the new beta, and I've crashed it twice. The first two transfers were fine (1 book, a handful of books), then crashed at (100 books, 20 books). I have attached the logs from Calibre, as well as the crash reports generated by MacOS. I'm back to only transferring 8 books at a time, which is working so far.

If there's anything more you would like me to test, let me know.
I looked at the crash reports. In both cases it crashed in libxml2 with a SEGV. That raises the question: are lxml and libxml2 thread safe? This document says libxml2 is but with caveats. I don't know what versions are being used nor can I determine if xmlInitParser() is called as directed.

I also noticed that both crashes happened when serializing the tree for output. Some notes:
  • This happens when container.flush_cache() is called. flush_cache() is called from within a thread pool worker. That means that one thread pool is creating a second thread pool. I can easily imagine that this isn't a good thing.
  • Because multiple threads can call flush_cache(), more than one thread can call commit_item() on the same tree. Does this write the file twice?
  • Because multiple threads can call flush_cache on the same dirtied set, trees can be committed at the same time. I don't know what happens if the two threads attempt to write the same file.
Two suggestions:
  • Change flush_cache() to do the work directly without creating threads. This avoids nesting thread pools.
  • Put generating the set of dirtied trees and calling commit_item() into a locked with. This avoids possibly committing the same item more than once at the same time.
Something like this:
Code:
# This goes in __init__
        super(KEPubContainer, self).__init__(epub_path, log, *args, **kwargs)
        self.flush_lock = threading.Lock() # New code
        self.log = log
        self.log.debug(f"Creating KePub Container for ePub at {epub_path}")

# This replaces flush_cache()
    def flush_cache(self) -> None:
        with self.flush_lock.acquire():
            for name in list(self.dirtied):
                self.commit_item(name, keep_parsed=True)

#    def __flush_cache_impl(self, name: str) -> None:
#       """Flush the cache, writing all cached values to disk."""
#       self.commit_item(name, keep_parsed=True)
EDIT: Looking further in the code in calibre, it seems to me that committing all the dirty items while threads are running is very dangerous. The calibre ebook container frequently sets an item as dirty before doing its work. Some other thread could then attempt to commit changes that are concurrently being made.

Personally, I would choose either
  • to remove the threading, or
  • ensure that:
    • no threaded item calls flush_cache()
    • the thread controller (__run_async()) runs all the threads, then after the "with" exits calls flush_cache().
The goal is to guarantee that no thread can affect the processing of another thread via the dirty settings.

My first choice would be to remove the threading to see if the problems go away. If they do then look at how threading can be put back safely. If they don't then back to the drawing board.

[EDIT 2]@davidfor: see the PM I sent you.

Last edited by chaley; 04-21-2022 at 10:31 AM. Reason: 1) Made the for loop use a copy of self.dirtied to avoid set changed size errors 2) added the EDIT section
chaley is offline   Reply With Quote