View Single Post
Old 07-14-2025, 04:27 PM   #33
frustratedhacker
Junior Member
frustratedhacker began at the beginning.
 
Posts: 8
Karma: 10
Join Date: Oct 2019
Device: KT3
Here it is. I'll say no more but add some notes I've been writing while doing this. The second and third points are important for its functioning, since I added an external library (sorry).

  • Changed files are ui.py, classify_web_service_webscraping.py and __init__.py (to add the lib folder to the path). Most changes are marked with a "### Bookmark" comment.
  • Added a record filtering for the title and author search.
  • The record filtering code uses the library "thefuzz" (https://github.com/seatgeek/thefuzz), so you have to install that or add it to the lib folder, if you can't/don't want to install it. I'll add another zip with the library included in the plugin (had to do it for me because I installed Calibre as a Flatpak). If you don't trust me, you can download it with pip.
  • Changed the way the script processes the title and author to form the URL; it was failing with some titles and authors. I also made it a bit more "loose" to avoid not getting matches.
  • "dc.identifier" isn't working now (might be a temporary problem). Changed it to "bath.isbn".
  • Shortly after my last post I added an option to change title and author's name for the query. Might be useful to remove extra text from the title (e.g: Third Edition, Book 2, Volume III) to get matches (tip: if there's a misspelled or incomplete word in the title, you probably won't get a match).
  • Added DDC edition extraction. It doesn't seem to bother the genre mapping function; the problem seems to be the forward slashes.
  • Added some code to handle multi-author books. It's hacky but it seems to work. Probably needs more testing.
  • A problem: The LCSH extractor grabs all headings from all filtered records and removes duplicates, but sometimes, two records might have slightly changed headings, which get passed to the final list of headings.
Attached Files
File Type: zip library_codes_sru_lib.zip (3.22 MB, 64 views)
File Type: zip library_codes_sru_nonlib.zip (325.8 KB, 46 views)
frustratedhacker is offline   Reply With Quote