MobileRead Forums - View Single Post

JimmXinu · 05-17-2025, 03:50 PM

Experimental Version Attached

2025-05-17
- Single proc bg processing, optionally split by site & accumulate results -- experimental

2025-05-18
- Improve job 'reconsolidate' for failed jobs and setting changing.

Background Job Processing in Past
When first implementing this downloader as a plugin in 2011, I copied the background processing code from Extract ISBN. That code, after launching as a Calibre background job, spawns another level of sub-processes, one per book, running as many processes in parallel at once as Calibre's config allow--generally as many as there are CPUs/cores.

Back in 2021, we finally realized that was allowing stories from the same site to run simultaneously, causing some sites to block users for excessive traffic. I changed FFF to instead run one background process per site, IE, only running one download from each site at a time, but still allowing parallel downloads from different sites.

Something that had been occasionally reported was that download jobs would not end when their parent Calibre job was stopped, or even when Calibre was closed. (At least on Windows.) I confess I didn't really pay them much attention until use of open_pages_in_browser made it more obvious something was going on.

Background Job Processing in This Version
The attached version has three additional optional settings For this first experimental version the settings are on the 'Other' tab of FFF Config with some explanatory verbiage.

I'm not entirely decided on which of these should be hard coded, optional settings, grouped as one setting, or what the defaults values should be. Opinions welcome.

The options are:

Use New Single Process Background Jobs
On by default. Don't spawn additional sub-processes from the Calibre background job; instead all downloads in the job are processed sequentially, more like how CLI does. With only this checked, the only difference from prior versions is some slow down when downloading stories from more than one site together in one job. Uncheck this to use the old multi-process code from before.
Split downloads into separate background jobs by site
Off by default. Split different sites into separate processes that can run in parallel, but they each run their own Calibre Job. You can see the different Jobs for different sites in Calibre's Job list. Because Calibre limits the number of simultaneous Jobs running, only a few will be running. This was true before as well, but it wasn't visible to the user. As each Job finishes, you will be prompted to update your library for each site separately, unless you also set:
Reconsolidate split downloads before updating library
Off by default. Only useful when Split downloads is also checked. When checked, FFF keeps track of which BG Jobs launched together and will wait until they all finish before prompting you to update your library.

Disadvantages of new version

Downloads from different sites only done in parallel if you also check 'Split downloads...'.
If split, you will get a separate 'Proceed to update library' question for each site, unless you also check 'Reconsolidate split downloads...'.

Advantages of new version

Download job actually stops when job is stopped or Calibre quits. No more open_pages_in_browser calls after you've quit Calibre.
Job Details (aka Job log) updates real time, you can watch downloads in progress.
Job start is quicker by several seconds.
'Split' without 'Reconsolidate' allows library updates sooner for sites that finish sooner.

Related Questions

Why not skip the 'Proceed with updating your library' question?
The Count Pages and Extract ISBN plugins, for example, allow the user to optionally skip the 'Proceed' question. I did revisit that while working on this and even tried it for a while.

The problem is that FFF's post-processing can be significant, and it uses a progress bar dialog. When multiple jobs finished, they would start updating simultaneously and stacking nearly identical progbars--and it could happen with manually started parallel updates, not just 'Split' jobs. Visually, it looked like one progbar jumping back and forth unexpectedly.

Rather than discovering what in FFF is and isn't safe for parallel access, I decided to keep the Proceed question. Which also follows recommended Calibre practice.
Why not use threads in the background process to allow parallel downloads?
I did try this as well. Basic testing seemed to work, but the log output for the different threads came out all interspersed. As someone who needs to read those logs a lot for troubleshooting, that was a deal breaker.

And again, FFF has been running basically single threaded this whole time. I don't think it would be a matter of if it caused problems, but when.
How does this affect Anthologies?
Anthology stories will download/update sequentially in one Job, ignoring the 'Split' option.

For anthologies of stories all from one site (I assume the vast majority of anthologies), it will actually be slightly faster, since the sub-proc launch time is saved.

For rarer anthologies of stories split between different sites, it will be slower, since there won't be any parallel downloads.

UPDATE:
2025-05-18
- Improve job 'reconsolidate' for failed jobs and setting changing.

05-17-2025, 03:50 PM	#10766
JimmXinu Plugin Developer Posts: 7,018 Karma: 4604635 Join Date: Dec 2011 Location: Midwest USA Device: Kobo Clara Colour running KOReader	Experimental Version Attached 2025-05-17 - Single proc bg processing, optionally split by site & accumulate results -- experimental 2025-05-18 - Improve job 'reconsolidate' for failed jobs and setting changing. Background Job Processing in Past When first implementing this downloader as a plugin in 2011, I copied the background processing code from Extract ISBN. That code, after launching as a Calibre background job, spawns another level of sub-processes, one per book, running as many processes in parallel at once as Calibre's config allow--generally as many as there are CPUs/cores. Back in 2021, we finally realized that was allowing stories from the same site to run simultaneously, causing some sites to block users for excessive traffic. I changed FFF to instead run one background process per site, IE, only running one download from each site at a time, but still allowing parallel downloads from different sites. Something that had been occasionally reported was that download jobs would not end when their parent Calibre job was stopped, or even when Calibre was closed. (At least on Windows.) I confess I didn't really pay them much attention until use of open_pages_in_browser made it more obvious something was going on. Background Job Processing in This Version The attached version has three additional optional settings For this first experimental version the settings are on the 'Other' tab of FFF Config with some explanatory verbiage. I'm not entirely decided on which of these should be hard coded, optional settings, grouped as one setting, or what the defaults values should be. Opinions welcome. The options are: Use New Single Process Background Jobs On by default. Don't spawn additional sub-processes from the Calibre background job; instead all downloads in the job are processed sequentially, more like how CLI does. With only this checked, the only difference from prior versions is some slow down when downloading stories from more than one site together in one job. Uncheck this to use the old multi-process code from before. Split downloads into separate background jobs by site Off by default. Split different sites into separate processes that can run in parallel, but they each run their own Calibre Job. You can see the different Jobs for different sites in Calibre's Job list. Because Calibre limits the number of simultaneous Jobs running, only a few will be running. This was true before as well, but it wasn't visible to the user. As each Job finishes, you will be prompted to update your library for each site separately, unless you also set: Reconsolidate split downloads before updating library Off by default. Only useful when Split downloads is also checked. When checked, FFF keeps track of which BG Jobs launched together and will wait until they all finish before prompting you to update your library. Disadvantages of new version Downloads from different sites only done in parallel if you also check 'Split downloads...'. If split, you will get a separate 'Proceed to update library' question for each site, unless you also check 'Reconsolidate split downloads...'. Advantages of new version Download job actually stops when job is stopped or Calibre quits. No more open_pages_in_browser calls after you've quit Calibre. Job Details (aka Job log) updates real time, you can watch downloads in progress. Job start is quicker by several seconds. 'Split' without 'Reconsolidate' allows library updates sooner for sites that finish sooner. Related Questions Why not skip the 'Proceed with updating your library' question? The Count Pages and Extract ISBN plugins, for example, allow the user to optionally skip the 'Proceed' question. I did revisit that while working on this and even tried it for a while. The problem is that FFF's post-processing can be significant, and it uses a progress bar dialog. When multiple jobs finished, they would start updating simultaneously and stacking nearly identical progbars--and it could happen with manually started parallel updates, not just 'Split' jobs. Visually, it looked like one progbar jumping back and forth unexpectedly. Rather than discovering what in FFF is and isn't safe for parallel access, I decided to keep the Proceed question. Which also follows recommended Calibre practice. Why not use threads in the background process to allow parallel downloads? I did try this as well. Basic testing seemed to work, but the log output for the different threads came out all interspersed. As someone who needs to read those logs a lot for troubleshooting, that was a deal breaker. And again, FFF has been running basically single threaded this whole time. I don't think it would be a matter of if it caused problems, but when. How does this affect Anthologies? Anthology stories will download/update sequentially in one Job, ignoring the 'Split' option. For anthologies of stories all from one site (I assume the vast majority of anthologies), it will actually be slightly faster, since the sub-proc launch time is saved. For rarer anthologies of stories split between different sites, it will be slower, since there won't be any parallel downloads. UPDATE: 2025-05-18 - Improve job 'reconsolidate' for failed jobs and setting changing. Last edited by JimmXinu; 05-19-2025 at 04:42 PM. Reason: Remove experimental version, test version posted