I was glad to see a request for a
BISAC scraper. I'd like to see that and something similar for
Library of Congress Subject Headings.
I have a hierarchical column called "LoC Subject Headings" (#locsh) of type "Comma separated text, like tags, shown in the tag browser". I'd like a button that populates it automatically.
The data can be scraped from WorldCat and LoC websites. For example, searching for "The Andy Warhol Diaries",
WorldCat returns:
"Warhol, Andy, -- 1928-1987 -- Diaries.
Artists -- United States -- Diaries.
Artists -- United States -- Biography.
Warhol, Andy, -- 1928-1987.
Artists.
United States."
The
Library of Congress returns:
"Warhol, Andy, 1928-1987 --Diaries.
Artists --United States --Diaries."
Some regex could massage these into:
"Warhol/ Andy 1928-1987.Diaries,Artists.United States.Diaries,Artists.United States.Biography,Warhol/ Andy 1928-1987,Artists,United States"
and
"Warhol/ Andy 1928-1987.Diaries,Artists.United States.Diaries"
(Note how ',' within tags must be handled, and the format of tags for a person.) and these could be sent to the #locsh column.
Similarly, for BISAC, an
Amazon search returns (and there may be other sources than Amazon):
"#52 in Kindle Store > Kindle eBooks > Biographies & Memoirs > Arts & Literature > Artists, Architects & Photographers
#246 in Books > Biographies & Memoirs > Arts & Literature > Artists, Architects & Photographers
#924 in Kindle Store > Kindle eBooks > Biographies & Memoirs > Professionals & Academics"
Which could be processed into:
"Biographies & Memoirs.Arts & Literature.Artists/ Architects & Photographers,Biographies & Memoirs.Arts & Literature.Artists/ Architects & Photographers,Biographies & Memoirs.Professionals & Academics"
and added to a BISAC (#bisac) custom column.
Incidentally, the LoC also has a similar field called "
Genre/Form Terms", but these haven't been widely worked out, and it is usually empty. News on them is
here.
I think there are similar plugins and this shouldn't be too hard for a good Python programmer. How about it?