Quote:
Originally Posted by davidjoseph1
Would it make it easier? After all, the code is in plaintext and isn't compiled, I would do so if the only question I have, about the copyright status of the code written by DaltonST, which is the vast and overwhelming majority of the code in LC, is answered to my satisfaction?
Anyway, the metadata from the Library of Congress is getting much much better, and I wonder if I can revive some of the other functions of LC.
|
Well, you'd have some extra work, but it'd make it easier to contribute and to see what's being changed and so on. The copyright status of the code we may never know, since its original author hasn't logged in in a while. Still, the code was given open-source, for free and without mention of any license, not to mention that you've made a substantial enough change in the code to consider it partly yours. Anyways, it was just an idea. If it doesn't make sense to you or you think it'll be too much work, that's fine.
I've been trying things with the code to improve metadata retrieval. Here's some observations and things I've done:
1. Got subject heading (LCSH) extraction working.
2. I found recently that there seems to be character limit to the author + title search. Here's an example (try removing "Joel" or "Century" and you'll get a result):
http://lx2.loc.gov:210/LCDB?query=dc...imumRecords=10
3. Sometimes, in author + title searches, less exact search terms help in finding additional records of the same book, which may have complementary data. Adding a submenu to the author + title search that opens a popup with the search terms, so the user can change them would be a nice addition. Here's an example of what I'm talking about
:
http://lx2.loc.gov:210/LCDB?query=dc...imumRecords=10
Notice how I used only the author's last name, since one of the records shortens the other names while the other doesn't. That gets me a subject heading. It's not much, but in other book, tweaking the search terms got me a DDC code.
4. The code in "classify_web_service_webscraping.py" needs a good cleanup. Removing commented code, deduplicating code, removing references to OCLC stuff (since the plugin is not using the classify service). I can help with that, though I'd do it at a slow pace, since it's not a priority.
5. As a side note, I've been looking into Open Library's API for this too, with the main concern of getting as complete book data as possible (LCC/DDC codes and subject headings). From what I've seen, the main data they show in a book's page seems to be taken from LoC records, but they also offer MARC records from other collections. I've seen a some cases where these MARC records had data (mainly DDC codes) that the LoC record didn't have. I'm not saying you should switch to Open Library's API, I'm just documenting my findings.
What other functionalities are you interested in reviving? If it's the "DDC/LCC to genre mapping", I'm interested in that one also (mostly for LCC codes). It needs a different approach, in my opinion, but it's difficult...
Anyways, I've written too much (sorry!) and it's getting late now. Let me know if you want the code for the LCSH extraction. It's AI generated, though. I don't know if you have problems with that.