Quote:
Originally Posted by jadhvaryu
For my project, i need to pull gutenberg ebooks (html & epub) formats based on genres, languages and authors.
However, I checked more than 100 books randomly but find that most books have missing/incomplete genres & authors.
Is this generally true or i am making some mistakes.
|
I'm surprised you are finding problems with Author's name - Gutenberg will normally show these correctly. Regarding "genres" if you look at the biblio record for a Gutenberg book you will see that they don't use "genres" but the LOC (Library of Congress) class. This Wikipedia page -
https://en.wikipedia.org/wiki/Librar...Classification shows the various "head" classifications which you might be able to use to map to your "genres".
However when you download a gutenberg Epub the "tags" that you get appear to be those for "Subject" in the biblio record and not the LOC class.
BobC