06-22-2011, 02:39 PM | #16 | ||
Grand Sorcerer
Posts: 11,734
Karma: 6690881
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
|
Quote:
Quote:
|
||
06-22-2011, 02:43 PM | #17 | |
Grand Sorcerer
Posts: 11,734
Karma: 6690881
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
|
Quote:
Both windows and *nix treat pipes as normal file descriptors with subprocess inheritance. I have used pipe inheritance in native code in both systems. I would be astonished if python doesn't support it. That said, I have been astonished before. |
|
Advert | |
|
06-22-2011, 03:00 PM | #18 |
creator of calibre
Posts: 43,843
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
I've added two new functions to the db API
format_metadata() returns size and last modified (as a datetime object) format_hash() These should be enough for kiwidude and we can have them do something sensible with a cloud based backend. |
06-22-2011, 03:10 PM | #19 |
creator of calibre
Posts: 43,843
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Coming to the question of pipes: It seems to me that what we really need is some kind of proxy object that allows the calling of methods from the db API in one process from another process. Jobs running in child processes can then pretty much do anything to the db by calling the appropriate API.
The multiprocessing module has the necessary IPC plumbing to implement this, I believe. |
06-22-2011, 03:37 PM | #20 |
creator of calibre
Posts: 43,843
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Hmm, multiprocessing has a Connection object that abstracts sockets on unix and named pipes on windows. Unfortunately, it only has poll() not select() which would have pretty bad performance implications, I imagine.
|
Advert | |
|
06-22-2011, 03:58 PM | #21 |
creator of calibre
Posts: 43,843
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
All in all, there doesn't seem to be anyway to do this. I though of using tcp/ip sockets, but then there will be problems on windows with antivirus programs blocking the creation of the sockets.
I dont really see any way out for this except to do what calibre does in this kind of situation, which is first run in process, copy out the data that is being worked on, and then launch a worker process that works on the data and returns the results via the filesystem. This does have performance implications, but for the vast majority of ebook files, the performance hit should be very small. For jobs that dont do a lot of work on the data, like duplicate finder, run them in process and use either spooledtempfile or special api like format_hash() That leaves launching external editors on data. At the moment, the only thing I can see is monitoring the temp file for changes. We can add API to have the temp file created outside the normal calibre temp dir, so that if the user leaves the external editor running when quitting calibre, it will not affect temp file cleanup for the other temp files. |
06-22-2011, 04:01 PM | #22 |
Calibre Plugins Developer
Posts: 4,636
Karma: 2162064
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
@Kovid - wrt the external editors. How is this thread going to know when you have stopped editing? As it is fairly common to save often (particularly when using Sigil which is so horribly buggy).
|
06-22-2011, 04:12 PM | #23 |
creator of calibre
Posts: 43,843
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
It wont, it will wait a little while after each change, and if there are no more, it will update, and on calibre shutdown and waiting files will be updated.
Another possibility is to just disable this functionality for network backend dbs and continue to allow direct access for local dbs. The idea being that if you want to work ona set of files, you move them from the network to a local library, work on them, once your done, move them back. |
06-22-2011, 04:42 PM | #24 | |||
Grand Sorcerer
Posts: 11,734
Karma: 6690881
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
|
Quote:
Quote:
Quote:
It might be possible to avoid polling loops by using pessimistic locking and letting the user to indicate that s/he is finished. This is similar to what kiwidude didn't want, but might be acceptable because it is non-blocking. In effect, we provide an 'export', locking the book object for update. Whatever application does its thing. When finished, run something that imports the results and breaks the lock. We could choose to go optimistic and not lock anything, with the problem that if the object is updated twice, one of the updates loses, but this also requires the user to indicate that s/he is finished. Optimistic locking works most of the time, but tends to fail spectacularly when it fails. My thought is that there might be a local 'cache' of the library outside of the temp directory. Exports go there and imports come from there. Both export and import are explicit commands. Detecting concurrent update isn't hard. The export would have a timestamp/signature. If at import time the object has a different signature, a choice would need to be made -- which wins. This is fairly classic multi-user DB stuff. Consider airline seat reservation. I see a map, pick a seat, and say 'go', only to be told that someone else has already reserved that seat. Same thing with purchases of items with limited stock. |
|||
06-22-2011, 05:04 PM | #25 |
creator of calibre
Posts: 43,843
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
The problem with threads and sockets is performance, there's no select, only poll for each socket. This will be rather nasty in the thread that manages the connections from child processes, unless we launch a new thread per child. And given that only a single python thread can run at a time...
Given all these complications, to me it's just more reasonable to export in process ->work out of process -> import in process rather than try to do everything out of process. The cost is one extra copy per file. Which given the file sizes of typical books seems fairly reasonable to me. |
06-22-2011, 05:13 PM | #26 | ||
Grand Sorcerer
Posts: 11,734
Karma: 6690881
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
|
Quote:
Quote:
One thing I don't know is what percentage of operations are what. Readers are P/C, but modifiers are not. |
||
06-22-2011, 05:27 PM | #27 | ||
creator of calibre
Posts: 43,843
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Quote:
Code:
had_operation = False for socket in connections: if socket.poll(): #handle the read in a relatively non blocking manner had_operation = True if not had_operation: #ensure release of the GIL time.sleep(0.01) Quote:
|
||
06-22-2011, 05:45 PM | #28 |
creator of calibre
Posts: 43,843
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Look at it like this. A job needs to do some work on data from the library. There are two possibilities:
1) If that work is fast/simple it can be run in process like with find duplicates. 2) If it is not, then the time taken for an extra disk-to-disk copy is going to be pretty small compared to the time taken to do the actual work. |
06-22-2011, 05:53 PM | #29 | |
Grand Sorcerer
Posts: 11,734
Karma: 6690881
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
|
Quote:
Code:
try: with open(x) as f: d = f.read(someAmount) outpipe.write(d) finally: outpipe.close() |
|
06-22-2011, 05:58 PM | #30 |
creator of calibre
Posts: 43,843
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Hmm, maybe Let's table this for now. kiwidude can continue to use format_abspath for the moment. Once the new db backend is nearer completion, we can revisit and write some code so that the performance can actually be measured.
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
The behavior of Apple | leebase | General Discussions | 30 | 03-17-2011 12:01 AM |
strange behavior | zeroh | Nook Color & Nook Tablet | 3 | 12-09-2010 11:14 AM |
strange behavior | valb2953 | Calibre | 1 | 11-22-2010 01:12 PM |
Tag behavior... | guyanonymous | Calibre | 1 | 11-29-2009 02:57 PM |