View Single Post
Old 02-14-2024, 05:26 PM   #7
chaley
Grand Sorcerer
chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.
 
Posts: 11,790
Karma: 7029971
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
This calibre python script might help. It produces a tab-separated CSV file for each book id provided on the command line. The CSV file contains 4 columns:
  • book id
  • extension
  • size in bytes
  • full path to file

To use it, save this script somewhere convenient:
Spoiler:
Code:
import json, os
from calibre.db.legacy import LibraryDatabase

'''
argument 1: path to the library

argument 2: list of book ids, e.g., 208,209. Quote the argument if there
            are spaces between the ids. calibredb search can be used to
            produce a list of ids. For example,
            calibredb search "size:>100000"
            finds all books with size > 100,000, printing the list of ids

output: a tab-separated csv list written to standard output
'''

# open the library database
db = LibraryDatabase(sys.argv[1]).new_api

# convert the book ids to a list of integers
book_ids = [int(bid.strip()) for bid in sys.argv[2].split(',')]

# get the library path from calibre. This removes symlinks and the like
library_path = db.backend.library_path

# print the header for the CSV output
print('\t'.join(('book_id', 'extension','size', 'path')))

# loop over the books generating the output for each format
for book_id in book_ids:
    # get the list of formats for the book
    formats = db.formats(book_id)

    # loop over the formats, generating the csv line
    for ext in formats:
        # get the metadata for the format: the extension, size, path, and modtime
        fmt_data = db.format_metadata(book_id, ext)
        # write the csv line for the format
        print('\t'.join((str(book_id), ext, str(fmt_data['size']), fmt_data['path'])))


Execute the script with
Code:
calibre-debug -e script_file_path library_path id1,id2,id3 > output.csv
Example using one of my test libraries:
  1. I saved the script as formats_sizes.py
  2. I used calibredb to get a list of ids, in this case books with 163 in the title
    Code:
    calibredb search "title:163"
    that produced the output
    Code:
    1337,1343,1361,1434
    You might want to use something like
    Code:
    calibredb search "size:>100000"
  3. I ran the script using
    Code:
    calibre-debug -e formats_sizes.py C:\CBH_Data\calibre.git\Library.test_small 1337,1343,1361,1434 > aaaaa.csv
The output is
Spoiler:
Code:
book_id	extension	size	path
1337	EPUB	790778	C:\CBH_Data\calibre.git\Library.test_small\Eric Flintt\16332 (1337)\16332 - Eric Flintt.epub
1337	ORIGINAL_EPUB	786992	C:\CBH_Data\calibre.git\Library.test_small\Eric Flintt\16332 (1337)\16332 - Eric Flintt.original_epub
1343	EPUB	736765	C:\CBH_Data\calibre.git\Library.test_small\Eric Flint\1632 (1343)\1632 - Eric Flint.epub
1343	MOBI	1029315	C:\CBH_Data\calibre.git\Library.test_small\Eric Flint\1632 (1343)\1632 - Eric Flint.mobi
1343	PRC	1093048	C:\CBH_Data\calibre.git\Library.test_small\Eric Flint\1632 (1343)\1632 - Eric Flint.prc
1361	EPUB	888796	C:\CBH_Data\calibre.git\Library.test_small\Weber, David\1633 (1361)\1633 - Weber, David.epub
1434	EPUB	990171	C:\CBH_Data\calibre.git\Library.test_small\A. B. C. Personn\1632 - Flint Eric (1434)\1632 - Flint Eric - A. B. C. Personn.epub


Libre office calc shows this file as
Click image for larger version

Name:	Clipboard01.jpg
Views:	37
Size:	112.8 KB
ID:	206401

Edit: it would be easy for the script to use a search expression instead of a list of ids. Let me know if you want an example script.

Last edited by chaley; 02-14-2024 at 05:42 PM. Reason: Discussed the possibility of using a search in the script
chaley is offline   Reply With Quote