03-24-2021, 10:38 PM | #1 |
Junior Member
Posts: 8
Karma: 10
Join Date: Mar 2021
Device: iPad
|
Gutenberg ebooks
** I am here first time so pardon if this is repeat question **
Hi All, For my project, i need to pull gutenberg ebooks (html & epub) formats based on genres, languages and authors. However, I checked more than 100 books randomly but find that most books have missing/incomplete genres & authors. Is this generally true or i am making some mistakes. Also, i am using ebooklib to read epub but find lot of limitations. I have struggling with this topic for several weeks now and hence would much appreciate any guidance in right direction. Thanks in advance |
03-25-2021, 02:13 AM | #2 |
Evangelist
Posts: 482
Karma: 2267928
Join Date: Nov 2015
Device: none
|
I wouldn't bother with project Gutenberg. They still use some 90's toolset to produce their books and the result sucks.
|
Advert | |
|
03-25-2021, 08:17 AM | #3 |
the rook, bossing Never.
Posts: 11,158
Karma: 85874891
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper11
|
It's the best source of public domain.
Any formatting or metadata issues are easily fixed with Calibre. I've no idea what "ebooklib" is. The eink such as Kobo and Kindle are best. Maybe the Kobo Libra is the best value 7" without adverts. Then there are very many poor apps on iOS and Android. KOreader (available from its website) allows some changes to format on an ereader or Android (install APK). I use it on a Boyue Likebook Mars. For an old Android phone or Tablet the Aldiko Classic (Playstore) and for a newer one use Lithium. The Wiki here has listings for iOS (Apple). |
03-25-2021, 02:10 PM | #4 |
Junior Member
Posts: 8
Karma: 10
Join Date: Mar 2021
Device: iPad
|
Thank you Sarmat89 and Qoth.
Qoth, ebooklib is python package that allows to read epub books in python program. How does calibre help? Does has repository of ebooks with enriched metadata that i can download/mirror for free like project gutenberg? Samrat89, am curious to know the reasons of your comments on gutenberg? Is there any alternative you find better? Thanks! |
03-26-2021, 09:53 AM | #5 |
Addict
Posts: 206
Karma: 547516
Join Date: Mar 2008
Location: Berlin, Germany
Device: KObo Clara, Kobo Aura, PRS-T1, PB602, CyBook Gen3
|
Calibre is a library tool to handle your ebooks. It can download meta data for your books.
It also contains an ebook viewer to read your books and an editor to correct bad formating. Calibre can also convert many ebook formats into one another. You can find many well groomed public domain ebooks right here in the library of Mobileread: Patricia Clark Memorial Library |
Advert | |
|
03-26-2021, 10:39 AM | #6 | |
Junior Member
Posts: 8
Karma: 10
Join Date: Mar 2021
Device: iPad
|
Quote:
|
|
03-26-2021, 12:42 PM | #7 |
the rook, bossing Never.
Posts: 11,158
Karma: 85874891
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper11
|
Sometimes the metadata found is for a DIFFERENT book, or has errors. So everything needs human reviewed.
Better advice is possible if you explain not a particular issue but what your end result is. |
03-26-2021, 04:13 PM | #8 | |
Guru
Posts: 691
Karma: 3026110
Join Date: Dec 2008
Location: Lancashire, U.K.
Device: BeBook 1, BeBook Pure, Kobo Glo, (and HD),Energy Sistem EReader Pro +
|
Quote:
However when you download a gutenberg Epub the "tags" that you get appear to be those for "Subject" in the biblio record and not the LOC class. BobC |
|
03-26-2021, 04:15 PM | #9 | |
Junior Member
Posts: 8
Karma: 10
Join Date: Mar 2021
Device: iPad
|
Quote:
I am part of team working on a purpose-built reader around Gutenberg free books. And need enriched metadata to help in books searching and selection. Essentially: we need title, abstract/summary, author(s), publisher(s), genre(s), keyword(s)/tag(s) and ISBN#. For now, we only care for English books. Also, the book format we prefer is HTML. And like i mentioned earlier, I randomly checked more than 100 books and found the completeness of meta data is consistently poor. And hence need a way to enrich it. I played with Calibre a bit. But seems it allows: - search results to be only 25 books - only interactive download of one format at a time - and metadata gathered still seems limited. (I tried the popular book "Complete Works" by William Shakespeare but still metadata was not enough. Also, i have already downloaded big set of gutenberg ebooks (HTML version zip file). Can i 'import' these books into calibre? Lastly, the purpose built reader will be priced for profit. We will not charge for the Gutenberg books, just the reader. If we end up using Calibre to maintain our books and update metadata, who can we talk to to understand usage / licensing terms. Many thanks! |
|
03-26-2021, 04:20 PM | #10 | |
Junior Member
Posts: 8
Karma: 10
Join Date: Mar 2021
Device: iPad
|
Quote:
I will look up the LoC classes. You mention gutenberg biblio records. Are they embedded in the ebook itself or separate data. Could you point me to them? Thank you for your comments. They are helpful and point me possibly in the right direction to meet my need. Best regards |
|
03-26-2021, 04:55 PM | #11 | |
frumious Bandersnatch
Posts: 7,516
Karma: 18512745
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
|
Quote:
Abstract/summary, genre(s), keyword(s)/tag(s) are all subjective. Someone has to fill them in and the values person A may choose will be different from those of person B. I don't know if any Gutenberg book has any of those, but I wouldn't trust them any more than the values you could get from any random bookstore... Publisher(s), ISBN# make no sense for Gutenberg books. There is no publisher, or they're all "Project Gutenberg". Even if the transcription is initially based on a printed book with an actual publisher, that doesn't mean the Gutenberg book has that publisher. ISBN numbers are specific for specific editions. Every paper edition of some particular work has a different number (assigned, at least partially, by some external authority). If a books is officially published both in paper and electronic format, it will most likely have separate ISBNs for both. The vast majority of Gutenberg books were published long before ISBN existed, and even if some books is based on a printed version with ISBN, that is definitely not the ISBN of the Gutenberg book. Gutenberg books are not facsimiles of printed editions, they're just another version (without ISBN). |
|
03-26-2021, 05:02 PM | #12 | |
Junior Member
Posts: 8
Karma: 10
Join Date: Mar 2021
Device: iPad
|
Quote:
Do you have any comments on ebook formats. We are planning to stick to html format and not epub (which also is HTML zip). What is any extra advantage of epub (except may be rights mgmt) over HTML format? Best regards |
|
03-26-2021, 05:03 PM | #13 |
the rook, bossing Never.
Posts: 11,158
Karma: 85874891
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper11
|
You download the books OUTSIDE of Calibre and import them!
HTML is a less good format to download from Gutenberg. Better to download epub or mobi (called Kindle now) and then convert it if you want HTML. But HTML on its own is poor for ebooks. That's why Mobi and epub were invented as they are really a zipped directory with HTML files for the body, typically one per chapter, CSS (typically two files), a system index, a resource file listing what is in it, image files and font files, if used. HTML is not a sensible format for any sort of proper ereader. The HTML is only for people that want to use a web browser, which is historic and madness today. Don't implement a reader that simply uses Gutenberg on demand. That's "cloud madness philosophy". Have a browser that can download, or a directory to import to and have the reader only use local files. Calibre is primarily a program to manage ebooks already copied to the computer. It imports a copy. Then you can manage metadata, searches and conversions and transfers to an ereader, or storage on phone/tablet that has an ereader app. Oddly there is a full contents search for epubs as an option in "Quality Check" tool. Normally epub is the best format to use. But Mobi is older. Some mobi may have old mobi and Kindle KF8/azw3 in the same file! However I find downloading "Kindle Format" from Gutenberg works best. Then I import that to Calibre, convert to ePub2 (using various options to fix quotes, remove paragraph space and have 1.4em first line indent, and embed Georgia font. Set line height and minimum line height both zero to allow user to change it, subset fonts etc). Then I check the cover and other metadata in Edit Meta data. There are plug-ins to search very many websites. I make sure the author name is consistent and correct before a metadata search. I was using Gutenberg when Kindle format was called Mobi, epub didn't exist, Kindle didn't exist and the first eink (by Sony) was later that year. Palm PDAs and Symbian Phones were probably the first gadgets other than laptops you could read ebooks on before dedicated ereaders existed. Gutenberg is THAT old, which is why text and HTML are offered. Last edited by Quoth; 03-26-2021 at 05:13 PM. |
03-26-2021, 05:08 PM | #14 | |
the rook, bossing Never.
Posts: 11,158
Karma: 85874891
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper11
|
Quote:
You need to decide what the genre(s) are. Some books are more than one and some don't fit easily. Especially the older they are. Any ISBN you find will be a particular modern reprint and might not even be exactly the same text. It might be revised. Different SIZE editions of a paper book with the same text have different ISBNs! |
|
03-26-2021, 05:14 PM | #15 | |
Junior Member
Posts: 8
Karma: 10
Join Date: Mar 2021
Device: iPad
|
Quote:
But will look again. Thanks! |
|
Tags |
ebook, ebooklib, epub, gutenberg, html |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Standard eBooks Is a Gutenberg Project You’ll Actually Use | ZodWallop | News | 104 | 02-04-2023 03:31 PM |
Best option for GUTENBERG? /free ebooks | RodRiquez | Which one should I buy? | 50 | 01-18-2021 09:08 AM |
KOBO compatibility with Project Gutenberg ebooks | craigaross | Introduce Yourself | 5 | 04-16-2011 08:19 AM |
Blackmask and Gutenberg Ebooks DVD Download | piet123 | Deals and Resources (No Self-Promotion or Affiliate Links) | 4 | 05-19-2007 12:21 AM |