This is what I came up with (not tested on Windows or Linux):
PHP Code:
from logging import info, basicConfig, INFO
from pathlib import Path
from re import fullmatch
from sys import argv
CONTENT_EXTENSIONS = {'.kfx', '.azw', '.azw1', '.azw3', '.pdf', '.txt'}
TMP_REGEX = r'\.tmp\d*_(ASC|PHL)'
basicConfig(level=INFO)
def remove_dir(folder):
try:
Path(folder).rmdir()
info(f'Removed {folder}')
except OSError as ex:
info(f'** {ex} **')
def rmtree(root):
for p in root.iterdir():
if p.is_dir():
rmtree(p)
else:
p.unlink()
root.rmdir()
scan_folder = Path('/Volumes/Kindle/documents' if len(argv) == 1 else argv[1])
if not scan_folder.is_dir():
info(f'Path {scan_folder} does not exist, quitting.')
exit(1)
content_file_stems = set()
tmp_files = []
sdr_folders = []
for path_ in scan_folder.glob('**/*'):
if '.sdr' in str(path_.parent) or 'updates' in path_.parent.parts:
# take no action on content in .sdr folders or /updates folder
continue
if path_.is_dir():
if path_.suffix == '.sdr':
sdr_folders.append(path_)
else:
remove_dir(path_)
else:
if path_.suffix in CONTENT_EXTENSIONS:
content_file_stems.add((path_.parent, path_.stem))
elif fullmatch(TMP_REGEX, path_.suffix):
tmp_files.append(path_)
for tmp_file in tmp_files:
if (tmp_file.parent, tmp_file.stem) not in content_file_stems:
Path(tmp_file).unlink()
info(f'Removed orphaned temp file {tmp_file}')
for sdr_folder in sdr_folders:
if (sdr_folder.parent, sdr_folder.stem) not in content_file_stems:
rmtree(sdr_folder)
info(f'Removed orphaned .sdr folder {sdr_folder}')
All standard library, no external dependencies. Might need Python 3. On Mac you can just type 'python clean.py' once Kindle is plugged in. On Windows, you can use
calibre-debug as interpreter (while supplying path argument like E:/documents).
It runs almost instantaneously, so I don't see any particular penalty to running it every time you plug Kindle in.
I haven't seen the tmp files except on one of my Kindles, so it's possible there are other patterns to look for.
I saw updates/ folder on a couple of them, with an .sdr folder inside. Not sure what it is for, but thought it best to leave it alone.
I think only
/documents needs to be a target for cleanup. I would leave it to user to clean up screen captures in root, for example.
And I haven't seen any 'turds' in .documents folder so far (Scribe).
[Update] re-factored to use (preferred in Python 3)
pathlib.