View Single Post
Old 03-26-2021, 04:13 PM   #8
BobC
Guru
BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.
 
Posts: 691
Karma: 3026110
Join Date: Dec 2008
Location: Lancashire, U.K.
Device: BeBook 1, BeBook Pure, Kobo Glo, (and HD),Energy Sistem EReader Pro +
Quote:
Originally Posted by jadhvaryu View Post

For my project, i need to pull gutenberg ebooks (html & epub) formats based on genres, languages and authors.

However, I checked more than 100 books randomly but find that most books have missing/incomplete genres & authors.

Is this generally true or i am making some mistakes.
I'm surprised you are finding problems with Author's name - Gutenberg will normally show these correctly. Regarding "genres" if you look at the biblio record for a Gutenberg book you will see that they don't use "genres" but the LOC (Library of Congress) class. This Wikipedia page - https://en.wikipedia.org/wiki/Librar...Classification shows the various "head" classifications which you might be able to use to map to your "genres".

However when you download a gutenberg Epub the "tags" that you get appear to be those for "Subject" in the biblio record and not the LOC class.

BobC
BobC is offline   Reply With Quote