Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Plugins

Notices

Reply
 
Thread Tools Search this Thread
Old 07-09-2025, 01:51 PM   #31
davidjoseph1
Connoisseur
davidjoseph1 can tell if an avocado is ripe without touching it.davidjoseph1 can tell if an avocado is ripe without touching it.davidjoseph1 can tell if an avocado is ripe without touching it.davidjoseph1 can tell if an avocado is ripe without touching it.davidjoseph1 can tell if an avocado is ripe without touching it.davidjoseph1 can tell if an avocado is ripe without touching it.davidjoseph1 can tell if an avocado is ripe without touching it.davidjoseph1 can tell if an avocado is ripe without touching it.davidjoseph1 can tell if an avocado is ripe without touching it.davidjoseph1 can tell if an avocado is ripe without touching it.davidjoseph1 can tell if an avocado is ripe without touching it.
 
Posts: 56
Karma: 130472
Join Date: May 2011
Device: BooxM90,M92*3,M96,N96, I86ml,C67ml,Kepler, Poke2,Nova3,MaxLumi2,TabUPC
Quote:
Originally Posted by frustratedhacker View Post
Well, you'd have some extra work, but it'd make it easier to contribute and to see what's being changed and so on. The copyright status of the code we may never know, since its original author hasn't logged in in a while. Still, the code was given open-source, for free and without mention of any license, not to mention that you've made a substantial enough change in the code to consider it partly yours. Anyways, it was just an idea. If it doesn't make sense to you or you think it'll be too much work, that's fine.
I've been away from this project for a few months and I'll take the weekend to put it up on Github if I can. I use LC in conjunction with the metadata plugin SRU because of some issues you touch on below.


Quote:
Originally Posted by frustratedhacker View Post
1. Got subject heading (LCSH) extraction working.
I would love to see your code. I haven't been able to wrap my head around ETREE and make that extraction work, and there are a bunch of different LC codes that contain alternate LCSH and Subject headings.

Quote:
Originally Posted by frustratedhacker View Post
2. I found recently that there seems to be character limit to the author + title search. Here's an example (try removing "Joel" or "Century" and you'll get a result):

http://lx2.loc.gov:210/LCDB?query=dc...imumRecords=10
That's interesting. I haven't seen that, and the underlying SRU query language doesn't mention character limits

Quote:
Originally Posted by frustratedhacker View Post
3. Sometimes, in author + title searches, less exact search terms help in finding additional records of the same book, which may have complementary data. Adding a submenu to the author + title search that opens a popup with the search terms, so the user can change them would be a nice addition. Here's an example of what I'm talking about:

http://lx2.loc.gov:210/LCDB?query=dc...imumRecords=10

Notice how I used only the author's last name, since one of the records shortens the other names while the other doesn't. That gets me a subject heading. It's not much, but in other book, tweaking the search terms got me a DDC code.
Interesting. I have had issues with multiple items returned for a single search, and LC doesn't know how to deal.
Quote:
Originally Posted by frustratedhacker View Post
What other functionalities are you interested in reviving? If it's the "DDC/LCC to genre mapping", I'm interested in that one also (mostly for LCC codes). It needs a different approach, in my opinion, but it's difficult...
Basically restoring the VIAF and LC Subject and Name Authorities lookup. There's a lot of extended value in those cross-references for further research and curiosity.

I would like to see the code for LCSH extraction!
davidjoseph1 is offline   Reply With Quote
Old 07-11-2025, 06:10 PM   #32
frustratedhacker
Junior Member
frustratedhacker began at the beginning.
 
Posts: 7
Karma: 10
Join Date: Oct 2019
Device: KT3
Quote:
Originally Posted by davidjoseph1 View Post
I would love to see your code. I haven't been able to wrap my head around ETREE and make that extraction work, and there are a bunch of different LC codes that contain alternate LCSH and Subject headings.
I spoke too soon about the LCSH extractor. It wasn't as complete as I thought. Now, after lots of testing, I think it's working pretty good, although I've made other changes which maybe you're not comfortable with. I still have to make a few changes and tests and I'll post it this weekend, most likely.

Quote:
Originally Posted by davidjoseph1 View Post
That's interesting. I haven't seen that, and the underlying SRU query language doesn't mention character limits
That's right, but at the moment of posting that, that query wasn't returning anything, not even an error. When I was doing tests, days after that, it worked fine. Maybe it was a server problem.

Quote:
Originally Posted by davidjoseph1 View Post
Basically restoring the VIAF and LC Subject and Name Authorities lookup. There's a lot of extended value in those cross-references for further research and curiosity.
The VIAF thing I've seen it in the code. It's just getting the author's VIAF id, right? I'm a bit lost with the LC Subject and Name Authorities lookup you mention. Can you specify where in the code is it?

For the VIAF, I discovered the viapy python library (https://github.com/Princeton-CDH/viapy), but you I saw in the code you can get a json response by changing the header in the request (https://github.com/Princeton-CDH/via...apy/api.py#L45). You can try this in the REPL:

Code:
import requests

def get_viaf_id_auto(name):
    resp = requests.get(
        "http://viaf.org/viaf/AutoSuggest",
        params={"query": name},
        headers={"accept": "application/json"}
    )
    data = resp.json()
    res = data.get("result", [])
    if not res:
        return None
    first_result = res[0]
    first_result_viafid = res[0]["viafid"]
    return first_result, first_result_viafid

first_result, first_result_viafid = get_viaf_id_auto("Jane Austen")

print("First result:", first_result, "\n")
print("First result's VIAF id:", first_result_viafid)
frustratedhacker is offline   Reply With Quote
Old 07-14-2025, 04:27 PM   #33
frustratedhacker
Junior Member
frustratedhacker began at the beginning.
 
Posts: 7
Karma: 10
Join Date: Oct 2019
Device: KT3
Here it is. I'll say no more but add some notes I've been writing while doing this. The second and third points are important for its functioning, since I added an external library (sorry).

  • Changed files are ui.py, classify_web_service_webscraping.py and __init__.py (to add the lib folder to the path). Most changes are marked with a "### Bookmark" comment.
  • Added a record filtering for the title and author search.
  • The record filtering code uses the library "thefuzz" (https://github.com/seatgeek/thefuzz), so you have to install that or add it to the lib folder, if you can't/don't want to install it. I'll add another zip with the library included in the plugin (had to do it for me because I installed Calibre as a Flatpak). If you don't trust me, you can download it with pip.
  • Changed the way the script processes the title and author to form the URL; it was failing with some titles and authors. I also made it a bit more "loose" to avoid not getting matches.
  • "dc.identifier" isn't working now (might be a temporary problem). Changed it to "bath.isbn".
  • Shortly after my last post I added an option to change title and author's name for the query. Might be useful to remove extra text from the title (e.g: Third Edition, Book 2, Volume III) to get matches (tip: if there's a misspelled or incomplete word in the title, you probably won't get a match).
  • Added DDC edition extraction. It doesn't seem to bother the genre mapping function; the problem seems to be the forward slashes.
  • Added some code to handle multi-author books. It's hacky but it seems to work. Probably needs more testing.
  • A problem: The LCSH extractor grabs all headings from all filtered records and removes duplicates, but sometimes, two records might have slightly changed headings, which get passed to the final list of headings.
Attached Files
File Type: zip library_codes_sru_lib.zip (3.22 MB, 8 views)
File Type: zip library_codes_sru_nonlib.zip (325.8 KB, 6 views)
frustratedhacker is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
[Metadata Source Plugin] SRU - Library of Congress & GBV (ger) vform Plugins 14 08-01-2024 05:50 PM
[GUI Plugin] Library Codes DaltonST Plugins 373 07-12-2024 11:04 AM
[GUI Plugin] Library Splitter DaltonST Plugins 31 07-11-2022 03:09 AM
How about using LC (Library Codes) plugin, w/ FAST/LCC/DDC derived tags? anoukaimee Library Management 0 02-09-2022 05:25 AM
[GUI Plugin]Problems retrieving LCC, Dewey etc codes birkmaggs Plugins 2 05-11-2018 10:43 PM


All times are GMT -4. The time now is 09:20 AM.


MobileRead.com is a privately owned, operated and funded community.