Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 11-06-2023, 05:44 PM   #1
robert
Junior Member
robert began at the beginning.
 
Posts: 8
Karma: 10
Join Date: Sep 2023
Device: Kobo Libra 2
Post How-To get Gutenberg-DE HTML to calibre (EPUB)?

Hi, I just bought the Gutenberg-DE edition 16.

I want to convert my favorite books to Epub and transfer them to my Kobo Libra 2.

Sounds too easy?

Well, http://www.epub2go.eu/ does the job from the online Gutenberg-DE repository.
However, it is not from the locally installed version.

I just pulled the index.html of a book of interest to calibre; it tells about 'importing metadata' and stalls (well, after 20 min at 0%, I gave up).

Any experience or tricks on how I can do it?

Note: I already complained to the Gutenberg-DE service about why they cannot make it a bit easier, e.g., by a calibre plugin..
robert is offline   Reply With Quote
Old 11-06-2023, 09:09 PM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,860
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
I dont know what gutenber-de is, but assuming its a colelction of HTML files, just let calibre run it has to parse all the html files looking for images/stylesheets/links etc so for a large colelction it will take time.
kovidgoyal is offline   Reply With Quote
Old 11-07-2023, 01:30 AM   #3
Capricorn
Belgian Pommes Frites
Capricorn will become famous soon enoughCapricorn will become famous soon enoughCapricorn will become famous soon enoughCapricorn will become famous soon enoughCapricorn will become famous soon enoughCapricorn will become famous soon enough
 
Posts: 101
Karma: 532
Join Date: Jan 2012
Device: Pocketbook Touch HD
It can be found here: https://www.projekt-gutenberg.org/
11789 Books from 2476 Authors in HTML-Format will take a LONG time to convert to Epub.
Capricorn is offline   Reply With Quote
Old 11-07-2023, 02:35 PM   #4
msel
Connoisseur
msel can grip it by the huskmsel can grip it by the huskmsel can grip it by the huskmsel can grip it by the huskmsel can grip it by the huskmsel can grip it by the huskmsel can grip it by the huskmsel can grip it by the huskmsel can grip it by the huskmsel can grip it by the huskmsel can grip it by the husk
 
msel's Avatar
 
Posts: 61
Karma: 141502
Join Date: Sep 2010
Device: Kindle Keyboard 3G
Quote:
Originally Posted by robert View Post
I just pulled the index.html of a book of interest to calibre; it tells about 'importing metadata' and stalls (well, after 20 min at 0%, I gave up).
Hello robert,

a) have you tried to open the Index.html with the calibre E-Book-Editor? (Right Click - Open With - E-Book editor)?

b) The books are on CD - right? Perhaps you must copy the book first from the cd on a local drive.

Greetings, Maria
msel is offline   Reply With Quote
Old 11-07-2023, 03:24 PM   #5
DNSB
Bibliophagist
DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.
 
DNSB's Avatar
 
Posts: 35,464
Karma: 145525534
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Forma, Clara HD, Lenovo M8 FHD, Paperwhite 4, Tolino epos
If Gutenberg.de is supplying the html books in .zip files which Gutenberg.org is fond of doing, I've found that I needed to unpack the .zip file into a temporary directory and then import it to calibre. I ended up using 7Zip and calibre from the command line the one time I tried to import ~200 books using a loop in a batch file to step through the directories.
DNSB is offline   Reply With Quote
Old 11-07-2023, 03:44 PM   #6
Quoth
the rook, bossing Never.
Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.
 
Quoth's Avatar
 
Posts: 11,164
Karma: 85874891
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper11
Quote:
Originally Posted by DNSB View Post
If Gutenberg.de is supplying the html books in .zip files which Gutenberg.org is fond of doing, I've found that I needed to unpack the .zip file into a temporary directory and then import it to calibre. I ended up using 7Zip and calibre from the command line the one time I tried to import ~200 books using a loop in a batch file to step through the directories.
Something similar with a free Baen Books CD ISO download, which had seriously weird stuff in it. I think downloaded nearly 15 years ago and forgotten till recently.

I suppose we are lucky Gutenberg aren't solely offering plain text, which they did before adding HTML and mobi (not called old Kindle till later).
Quoth is offline   Reply With Quote
Old 11-07-2023, 07:02 PM   #7
robert
Junior Member
robert began at the beginning.
 
Posts: 8
Karma: 10
Join Date: Sep 2023
Device: Kobo Libra 2
Quote:
Originally Posted by Capricorn View Post
It can be found here: https://www.projekt-gutenberg.org/
11789 Books from 2476 Authors in HTML-Format will take a LONG time to convert to Epub.
Hi, I want to import a single book. Now I try it on my Gaming Laptop. Let's hope...
robert is offline   Reply With Quote
Old 11-07-2023, 07:05 PM   #8
robert
Junior Member
robert began at the beginning.
 
Posts: 8
Karma: 10
Join Date: Sep 2023
Device: Kobo Libra 2
Quote:
Originally Posted by msel View Post
Hello robert,

a) have you tried to open the Index.html with the calibre E-Book-Editor? (Right Click - Open With - E-Book editor)?

b) The books are on CD - right? Perhaps you must copy the book first from the cd on a local drive.

Greetings, Maria
Hi Maria,
Yes, I first copied the .zip Download (7.6 GB) to my SSD drive and unzipped it.
Now I try opening it with the Calibre epub editor. Quite annoying. Why don't they offer epubs or a decent converter/ Calibre plugin a la epub2go?
Best, Robert
robert is offline   Reply With Quote
Old 11-09-2023, 02:34 PM   #9
robert
Junior Member
robert began at the beginning.
 
Posts: 8
Karma: 10
Join Date: Sep 2023
Device: Kobo Libra 2
Quote:
Originally Posted by DNSB View Post
If Gutenberg.de is supplying the html books in .zip files which Gutenberg.org is fond of doing, I've found that I needed to unpack the .zip file into a temporary directory and then import it to calibre. I ended up using 7Zip and calibre from the command line the one time I tried to import ~200 books using a loop in a batch file to step through the directories.
Hi, I tried that now as well and maybe I got a hint why this takes forever:
~~~
rob@robert-winkler-Lenovo ~/d/g/a/judas> ebook-convert index.html andrejew_judas.epub
1% Eingabe wird zu HTML konvertiert*…
InputFormatPlugin: HTML Input running
on /home/rob/dataspace/gutenberg-edition16/andrejew/judas/index.html
Language not specified
Building file list...
IgnoreFile('/home/rob/dataspace/gutenberg-edition16/info/texte/Vom-Antiquariat-zum-E-Text.pdf is a binary file')
IgnoreFile('/home/rob/dataspace/gutenberg-edition16/plautus/asinaria/asinaria.pdf is a binary file')
IgnoreFile('/home/rob/dataspace/gutenberg-edition16/plautus/epidicus/epidicus.pdf is a binary file')
IgnoreFile('/home/rob/dataspace/gutenberg-edition16/plautus/mercator/mercator.pdf is a binary file')
IgnoreFile('/home/rob/dataspace/gutenberg-edition16/plautus/mostell1/mostell1.pdf is a binary file')
~~~
etc.

This indicates that calibre is scanning through all the directories?

By chance, do you have a working script to process a single directory?
robert is offline   Reply With Quote
Old 11-09-2023, 03:19 PM   #10
DNSB
Bibliophagist
DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.
 
DNSB's Avatar
 
Posts: 35,464
Karma: 145525534
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Forma, Clara HD, Lenovo M8 FHD, Paperwhite 4, Tolino epos
What does index.html contain? Can you post an image of the structure of the .zip file? One publisher used to include an index.html on their CD's that contains all the ebooks & formats on the CD so you needed to import from the subdirectories to get the ebooks.
DNSB is offline   Reply With Quote
Old 11-12-2023, 12:38 PM   #11
robert
Junior Member
robert began at the beginning.
 
Posts: 8
Karma: 10
Join Date: Sep 2023
Device: Kobo Libra 2
Quote:
Originally Posted by DNSB View Post
What does index.html contain? Can you post an image of the structure of the .zip file? One publisher used to include an index.html on their CD's that contains all the ebooks & formats on the CD so you needed to import from the subdirectories to get the ebooks.
Hi DNSB,
An example index.html (in the subdirectory of the book) looks as follows. Yes, I would import the books 1x1 from the subdirectory.
Spoiler:

~~~
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "hmpro6.dtd">
<HTML lang="de">
<head>

<title>Judas Ischariot und die andern</title>
<link rel="stylesheet" type="text/css" href="../../css/prosa.css" />
<meta name="author" content="Leonid Andrejew" />
<meta name="title" content="Judas Ischariot und die andern" />
<meta name="publisher" content="Bühnen- und Buchverlag russischer Autoren J. Ladyschnikow" />
<meta name="year" content="o.J." />
<meta name="translator" content="Otto Buck" />
<meta name="corrector" content="reuters@abc.de" />
<meta name="sender" content="www.gaga.net" />
<meta name="created" content="20210916" />
<meta name="projectid" content="cb8ba936" />
<link href="../../css/dropdown.css" rel="stylesheet" type="text/css" />
<link rel="stylesheet" type="text/css" href="../../css/form.css" />
<meta name="description" content="Projekt Gutenberg | Die weltweit größte kostenlose deutschsprachige Volltext-Literatursammlung | Klassische Werke von A bis Z | Bücher gratis online lesen">
<script type="text/javascript" src="../../js/showmeta.js"></script>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta http-equiv="content-language" content="de">
</head>

<body> <div class="navi-gb">
<p><a name="top" id="top">*</a></p>
<table cellspacing="0" cellpadding="5" align="center" class="center">
<tr>
<td colspan="15" align="center" class="center">
<a href="../../info/texte/index.html">
<img border="0" alt="Projekt Gutenberg-DE" src="../../info/pic/banner-ed.jpg" />
</a>
</td>
</tr>
<tr>
<td class="mainnav">
<a href="../../autoren/info/autor-az.html">Autoren</a>
</td>
<td class="mainnav" align="center">∞</td>
<td align="center" class="mainnav">
<a href="../../info/texte/allworka.html">Werke</a>
</td>
<td class="mainnav" align="center">∞</td>
<td align="center" class="mainnav">
<a href="../../info/texte/neu.html">Neu</a>
</td>
<td class="mainnav" align="center">∞</td>
<td align="center" class="mainnav">
<a href="../../info/texte/info.html">Information</a>
</td>
<td class="mainnav" align="center">∞</td>
<td align="center" class="mainnav">
<a href="https://gutenberg.abc.de" target="_blank">Shop</a>
</td>
<td class="mainnav" align="center">∞</td>
<td align="center" class="mainnav">
<a href="../../info/texte/lesetips.html">Lesetips</a>
</td>
<td align="center" class="mainnav">∞</td>
<td align="center" class="mainnav">
<a onclick="ShowMeta()" onmouseout="HideMeta()">Textquelle</a>
</td>
<td align="center" class="mainnav">∞</td>
<td align="center" class="mainnav">
</td>
</tr>
</table>
<p><h5><a href="../../autoren/namen/andrejew.html"></a></h5>
<h5></h5>
<h3>Inhaltsverzeichnis</h3><br/>
<ul>
<li><a href="titlepage.html">Leonid Andrejew</a></li>
<li><a href="chap001.html">I.</a></li>
<li><a href="chap002.html">Lazarus</a></li>
</ul></p> <div class="bottomnavi-gb">
<table cellpadding="4" cellspacing="0" align="center" class="center">
<tr>
<td class="mainnav"><a href="../../info/texte/impress.html">Impressum</a></td>
<td align="center" class="mainnav">∞</td>
<td class="mainnav"><a href="#top">Nach oben</a></td>
<td align="center" class="mainnav">∞</td>
</tr>
</table>
</div>
</body>
</html>
~~~

The structure of the book subdirectory is:
~~~
...andrejew> tree judas
judas
├── bilder
│** ├── 0001.gif
│** ├── a-ini.gif
│** ├── cover.jpg
│** ├── end1.gif
│** ├── end2.gif
│** └── o-ini.gif
├── chap001.html
├── chap002.html
├── index.html
├── judas.html
└── titlepage.html

2 directories, 11 files
~~~

Last edited by theducks; 11-12-2023 at 06:24 PM. Reason: Spoiler logs.
robert is offline   Reply With Quote
Old 11-13-2023, 01:28 AM   #12
Capricorn
Belgian Pommes Frites
Capricorn will become famous soon enoughCapricorn will become famous soon enoughCapricorn will become famous soon enoughCapricorn will become famous soon enoughCapricorn will become famous soon enoughCapricorn will become famous soon enough
 
Posts: 101
Karma: 532
Join Date: Jan 2012
Device: Pocketbook Touch HD
When I look at the "Impressum" webpage of the website where you bought it, it is clear that it is not gutenberg.org that manages it, but a private publisher. On this page they state they can deliver the stuff in epub - see https://www.abc.de/
So, ask them to send it all to you in epub format instead of html.
Capricorn is offline   Reply With Quote
Old 11-14-2023, 02:42 PM   #13
robert
Junior Member
robert began at the beginning.
 
Posts: 8
Karma: 10
Join Date: Sep 2023
Device: Kobo Libra 2
Thumbs up SOLVED: conversion of Gutenberg-DE HTML to epub

With https://github.com/JohnButzel/Gutenberg2Epub you can extract a book from Gutenberg-DE online or local. The resulting output is compatible with Calibre and an ebook Reader (tested on a Kobo Libra 2).

Thanks a lot to the author!!!
robert is offline   Reply With Quote
Reply

Tags
calibre, epub, gutenberg-de, html, kobo libra 2


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Calibre EPUB Conversion -- EPUB 3 and HTML vs. XHTML internal file naming GranitStateColin Calibre 5 06-04-2023 09:44 AM
html to epub via Sigil or Calibre? Bigo2 Calibre 2 07-01-2012 02:07 AM
Yet Another Gutenberg Book/HTML converter FangornUK Sony Reader 59 05-01-2009 10:15 AM
HTML from Project Gutenberg? Rcartes Sony Reader 10 04-21-2009 07:26 PM
Mazarin - Gutenberg in HTML Alexander Turcic Deals and Resources (No Self-Promotion or Affiliate Links) 0 05-25-2004 03:11 AM


All times are GMT -4. The time now is 11:51 AM.


MobileRead.com is a privately owned, operated and funded community.