MobileRead Forums - View Single Post

Rellwood · 06-14-2021, 08:08 PM

I think I figured out what causes it sometimes takes a longer time to update columns. The length of time it takes is in direct proportion to the following.

1. The columns it is updating must shouldn't be filled with something already. The more data that is is copying over the longer it takes. If you are updating a tags column, it is better to either delete all the current tags, or have a blank tags column to fill up then copy it over to the desired column. The same is true of date columns, it is better to only update cells with actual dates rather than just a blanket update all if there is data already in the columns. The thing that makes the biggest difference is when you are updating a lot of new information in a column that already has a lot of information. Like a tags column.

2. Lists that require a lot of books to be matched manually should be imported as follows
Delete any unmatched books from the list, update the metadata and import the matched books. Depending on how many unmatched books there were you can do a couple of things -

1 - rerun the list again - using different parameters for matching hoping you match what you didn't match before ie. using an id instead of title author. Only update the matched books that haven't already been matched. I keep an extra yes/no column that I add to every imported list as a "Yes" right down the line so that when I import new books I can see if they have already been imported previously. Delete any previously imported book from the list, (you can see if it has been imported because that yes/no column will let you know) delete any unmatched book, and update new matches.

2 - Import the list, delete the unmatched books, update the new matched books then chunk up the .csv into smaller amounts using a csv file chunker based on how many unmatched there were. The goal is to keep each chunk with only a small amount of unmatched. Delete all the matched books in each imported chunk (they already got matched in the first run). Match up each unmatched and update the metadata and run it. Hopefully there are not too many books to update in each chunk so it won't take as long.

For a long list with a lot of different ways to match up, I tend to do both one and two. Step one means updating the metadata for books matched up different ways through different imports of the same lsit. Then once I have imported every possible matched book, I then chunk up the list and reimport each chunk - deleting the already matched books from the list and updating the unmatched. If you do it right you will only be updating a couple of books each chunk and then it will be a lot faster than if you just tried to import the whole list.

The ultimate goal to make the importing go faster than one book for three or four seconds is to make sure the books match without needing to manually do it, and make sure the columns that are being updated with new metadata are blank or have small bits of data - and don't just mass update a column when you see that you only need to update a couple of books. The less copying over current data and the less importing columns with no data (update metadata for a whole column even if only a couple of books need updating - like a date column). It isn't so bad to copy over a yes/no or single data column than it is to update a tags column with a ton of new data.

This seems like a lot of work for the casual list importer, but if you are like me and import calabre catalogs from other libraries, the Goodreads CSV or other lists, these steps will save a lot of time in the long run. Again, this is only if you find yourself sitting around forever while a list imports - and this happens a lot.

06-14-2021, 08:08 PM	#544
Rellwood Library Breeder (She/Her) Posts: 1,301 Karma: 1937893 Join Date: Apr 2015 Location: Fullerton, California Device: Paperwhite 2015 (2), PW 2024 (12 GEN), PW 2023 (11 GEN), Scribe (1st)	I think I figured out what causes it sometimes takes a longer time to update columns. The length of time it takes is in direct proportion to the following. 1. The columns it is updating must shouldn't be filled with something already. The more data that is is copying over the longer it takes. If you are updating a tags column, it is better to either delete all the current tags, or have a blank tags column to fill up then copy it over to the desired column. The same is true of date columns, it is better to only update cells with actual dates rather than just a blanket update all if there is data already in the columns. The thing that makes the biggest difference is when you are updating a lot of new information in a column that already has a lot of information. Like a tags column. 2. Lists that require a lot of books to be matched manually should be imported as follows Delete any unmatched books from the list, update the metadata and import the matched books. Depending on how many unmatched books there were you can do a couple of things - 1 - rerun the list again - using different parameters for matching hoping you match what you didn't match before ie. using an id instead of title author. Only update the matched books that haven't already been matched. I keep an extra yes/no column that I add to every imported list as a "Yes" right down the line so that when I import new books I can see if they have already been imported previously. Delete any previously imported book from the list, (you can see if it has been imported because that yes/no column will let you know) delete any unmatched book, and update new matches. 2 - Import the list, delete the unmatched books, update the new matched books then chunk up the .csv into smaller amounts using a csv file chunker based on how many unmatched there were. The goal is to keep each chunk with only a small amount of unmatched. Delete all the matched books in each imported chunk (they already got matched in the first run). Match up each unmatched and update the metadata and run it. Hopefully there are not too many books to update in each chunk so it won't take as long. For a long list with a lot of different ways to match up, I tend to do both one and two. Step one means updating the metadata for books matched up different ways through different imports of the same lsit. Then once I have imported every possible matched book, I then chunk up the list and reimport each chunk - deleting the already matched books from the list and updating the unmatched. If you do it right you will only be updating a couple of books each chunk and then it will be a lot faster than if you just tried to import the whole list. The ultimate goal to make the importing go faster than one book for three or four seconds is to make sure the books match without needing to manually do it, and make sure the columns that are being updated with new metadata are blank or have small bits of data - and don't just mass update a column when you see that you only need to update a couple of books. The less copying over current data and the less importing columns with no data (update metadata for a whole column even if only a couple of books need updating - like a date column). It isn't so bad to copy over a yes/no or single data column than it is to update a tags column with a ton of new data. This seems like a lot of work for the casual list importer, but if you are like me and import calabre catalogs from other libraries, the Goodreads CSV or other lists, these steps will save a lot of time in the long run. Again, this is only if you find yourself sitting around forever while a list imports - and this happens a lot. Last edited by Rellwood; 06-14-2021 at 08:40 PM. Reason: Clarification