Is it possible to convert a book using an *unzipped* HTML file as the input?

jdunning · 08-22-2019, 10:25 PM

I've got a single HTML file that contains the book and its CSS. I can zip that up with some images that the styles use, import that .zip into Calibre, and then convert it to any other formats just fine. But if I make an edit and want to convert the book again, I have to zip up the edited file and replace the zip in the Calibre library. Doing that over and over is a pain.

Is there any way to simply add a folder containing an HTML file to Calibre so that you can edit the HTML easily? I've tried dropping an HTML file into the app, but it just gets converted to a .zip. Seems like it should be a simple feature to enable.

kovidgoyal · 08-23-2019, 12:49 AM

No, books have to be files, not folders. And most decent desktop environmnets allow you to open zip files as though they are folders, so you dot need to perform any complicated maneuvers to edit files inside a zip file.

jdunning · 08-23-2019, 01:57 PM

Sure, you can open the archive and view the files, but you can't edit them. I use Webstorm as an IDE, and I can drag a file from an archive into it, edit it, and save it, but all that's doing is modifying a temp file. It doesn't affect the archive at all.

lumpynose · 08-23-2019, 04:14 PM

In Calibre's preferences, in Toolbars & Menus, in the left column Available actions, at the bottom is Unpack book. You could add that to your main toolbar, then do your editing in Calibre and use Unpack book to get the HTML version.

jdunning · 08-23-2019, 08:36 PM

Quote:

Originally Posted by lumpynose

then do your editing in Calibre and use Unpack book to get the HTML version.

I'm going the other way: I've already got the unzipped HTML and want to convert it to .azw3 and other formats after editing the HTML. I can open the zip in 7zip and then click Edit, which opens the file in an editor, but changes aren't detected until you close the editing app. Launching Webstorm every time I want to make a small change takes as long as manually zipping the folder and copying it over.

lumpynose · 08-23-2019, 09:53 PM

Quote:

Originally Posted by jdunning

I'm going the other way: I've already got the unzipped HTML and want to convert it to .azw3 and other formats after editing the HTML. I can open the zip in 7zip and then click Edit, which opens the file in an editor, but changes aren't detected until you close the editing app. Launching Webstorm every time I want to make a small change takes as long as manually zipping the folder and copying it over.

Yes, I understand. What I'm suggesting is thinking about it from the opposite direction; instead of editing the files in the zip file, only edit them in Calibre. After you import the zip file treat the files in Calibre as the masters, not the ones in the zip file. Calibre has a very competent editor and a live updating preview window. Keep the zip file as a backup in case things go wrong with Calibre and you need to start over.

BetterRed · 08-23-2019, 09:55 PM

@jdunning - If you're willing to work from an epub, then you could make use of Sigil's Open With feature to open xhtml files in Webstorm - assuming you couldn't do what you want within Sigil itself.

BR

kovidgoyal · 08-23-2019, 10:52 PM

If I were you what I would do is open the calibre editor, choose File->Import an HTML or DOCX file as a new book. This will create an EPUB or AZW3 for you. And then either use the editor to make your small edits, or right click the html file select export, edit it with whatever you need, then right click it and select replace file and replace it withth eedited file.

jdunning · 08-26-2019, 05:21 PM

Coming from the web world, it feels simpler to have a single HTML file with clean markup and CSS as the source of truth, and which can then be converted to other formats. It also makes it easier to do global changes, search and replace, etc. I appreciate Calibre's ability to convert the CSS cascade to multiple classes as needed, but editing the processed files means it gets out of sync with the HTML.

I get that Calibre currently assumes books are single files, but it already handles multiple files within zips, so it seems like it would be a small step to read those same files from a folder, rather than an archive. Or maybe a plugin could do it.

lumpynose · 08-26-2019, 06:32 PM

Quote:

Originally Posted by jdunning

Coming from the web world, it feels simpler to have a single HTML file with clean markup and CSS as the source of truth, and which can then be converted to other formats. It also makes it easier to do global changes, search and replace, etc. I appreciate Calibre's ability to convert the CSS cascade to multiple classes as needed, but editing the processed files means it gets out of sync with the HTML.

I get that Calibre currently assumes books are single files, but it already handles multiple files within zips, so it seems like it would be a small step to read those same files from a folder, rather than an archive. Or maybe a plugin could do it.

Look at how the search and replace works in Calibre's editor; there are 6 modes; current file, all text files, all style files, selected files, open files, and marked text. You can hardly ask for more for doing global changes.

Calibre doesn't assume that books are a single file. The final packaging is a single file; that's mandated by the format, epub, azw, etc.

Whenever I convert a web page(s) to an ebook I import the files into the editor and do all my work there. Sometimes it's multiple web pages, some times one. Regardless, I've never felt the need to do anything with the originals unless I screw up and need to start over.

theducks · 08-26-2019, 06:48 PM

I've never seen a good reason to work with a single, huge HTML file.
As Lumpy pointed out, Calibre (and Sigil) both have search (& replace) settings that control the scope.
Spell check is one of the few tools that does NOT have that control. But since yo can double click the word in the S C, and it will jump to the next occurance, there is no 'blind faith' correcting there.

lumpynose · 08-26-2019, 07:28 PM

Quote:

Originally Posted by theducks

I've never seen a good reason to work with a single, huge HTML file.

Not really germane to this topic but one place where it is useful is when I'm converting a book from Gutenberg's "peculiar" format to something more palatable. I use Sigil and what I do is unzip the epub, then pull in all of the html files into Sigil, then use Sigil's merge feature to combine them into one html file and change the suffix to xhtml. For whatever reason the Gutenberg files don't have each chapter in a separate html file. So then my next step is to prepend above the h<whatever> tags Sigil's split marker. Then tell Sigil to split at the markers.

Edit: I forgot to say that I replace everything above the start body tag with the corresponding stuff from a new sigil .xhtml file, so that it has the correct epub3 stuff.

DNSB · 08-27-2019, 02:16 AM

Quote:

Originally Posted by lumpynose

Not really germane to this topic but one place where it is useful is when I'm converting a book from Gutenberg's "peculiar" format to something more palatable. I use Sigil and what I do is unzip the epub, then pull in all of the html files into Sigil, then use Sigil's merge feature to combine them into one html file and change the suffix to xhtml. For whatever reason the Gutenberg files don't have each chapter in a separate html file. So then my next step is to prepend above the h<whatever> tags Sigil's split marker. Then tell Sigil to split at the markers.

Edit: I forgot to say that I replace everything above the start body tag with the corresponding stuff from a new sigil .xhtml file, so that it has the correct epub3 stuff.

Why not simply open the Gutenberg epub, merge the html files and then split them at chapter boundaries? I do this in Sigil but calibre's editor has similar functionality.

lumpynose · 08-27-2019, 02:41 AM

Quote:

Originally Posted by DNSB

Why not simply open the Gutenberg epub, merge the html files and then split them at chapter boundaries? I do this in Sigil but calibre's editor has similar functionality.

Yeah, I could do that. I hadn't thought of that. It seems to be my lot in life to overcomplicate things.

Although I have screwed things up and then I have the original PG epub files that I could pull in again. (Just did so today.)

08-22-2019, 10:25 PM	#1
jdunning Member Posts: 10 Karma: 10 Join Date: Jul 2019 Device: Kindle Oasis	*Is it possible to convert a book using an unzipped* HTML file as the input?** I've got a single HTML file that contains the book and its CSS. I can zip that up with some images that the styles use, import that .zip into Calibre, and then convert it to any other formats just fine. But if I make an edit and want to convert the book again, I have to zip up the edited file and replace the zip in the Calibre library. Doing that over and over is a pain. Is there any way to simply add a folder containing an HTML file to Calibre so that you can edit the HTML easily? I've tried dropping an HTML file into the app, but it just gets converted to a .zip. Seems like it should be a simple feature to enable.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Work on an unzipped EPUB xhtml file	roger64	Workshop	7	01-17-2016 01:06 AM
Book Tags in Input HTML File?	titani	Calibre	6	08-07-2014 12:37 AM
HTML input plugin stripping text within toc tags in child html file	nimblebooks	Conversion	3	02-21-2012 04:24 PM
Convert HTML to MOBI (HTML recognized as ZIP file)	pdubois	Conversion	1	01-25-2011 01:55 PM
How can i convert HTML or txt file to EPUB file ?	guguqiaqia	ePub	7	05-28-2010 10:15 PM

08-23-2019, 12:49 AM	#2
kovidgoyal creator of calibre Posts: 46,056 Karma: 29579868 Join Date: Oct 2006 Location: Mumbai, India Device: Various	No, books have to be files, not folders. And most decent desktop environmnets allow you to open zip files as though they are folders, so you dot need to perform any complicated maneuvers to edit files inside a zip file.

08-23-2019, 01:57 PM	#3
jdunning Member Posts: 10 Karma: 10 Join Date: Jul 2019 Device: Kindle Oasis	Sure, you can open the archive and view the files, but you can't edit them. I use Webstorm as an IDE, and I can drag a file from an archive into it, edit it, and save it, but all that's doing is modifying a temp file. It doesn't affect the archive at all.

08-23-2019, 04:14 PM	#4
lumpynose Wizard Posts: 1,086 Karma: 6719822 Join Date: Jul 2012 Device: Palm Pilot M105	In Calibre's preferences, in Toolbars & Menus, in the left column Available actions, at the bottom is Unpack book. You could add that to your main toolbar, then do your editing in Calibre and use Unpack book to get the HTML version.

08-23-2019, 09:55 PM	#7
BetterRed null operator (he/him) Posts: 22,462 Karma: 31000056 Join Date: Mar 2012 Location: Sydney Australia Device: none	@jdunning - If you're willing to work from an epub, then you could make use of Sigil's Open With feature to open xhtml files in Webstorm - assuming you couldn't do what you want within Sigil itself. BR

08-23-2019, 10:52 PM	#8
kovidgoyal creator of calibre Posts: 46,056 Karma: 29579868 Join Date: Oct 2006 Location: Mumbai, India Device: Various	If I were you what I would do is open the calibre editor, choose File->Import an HTML or DOCX file as a new book. This will create an EPUB or AZW3 for you. And then either use the editor to make your small edits, or right click the html file select export, edit it with whatever you need, then right click it and select replace file and replace it withth eedited file.

08-26-2019, 05:21 PM	#9
jdunning Member Posts: 10 Karma: 10 Join Date: Jul 2019 Device: Kindle Oasis	Coming from the web world, it feels simpler to have a single HTML file with clean markup and CSS as the source of truth, and which can then be converted to other formats. It also makes it easier to do global changes, search and replace, etc. I appreciate Calibre's ability to convert the CSS cascade to multiple classes as needed, but editing the processed files means it gets out of sync with the HTML. I get that Calibre currently assumes books are single files, but it already handles multiple files within zips, so it seems like it would be a small step to read those same files from a folder, rather than an archive. Or maybe a plugin could do it.

08-26-2019, 06:48 PM	#11
theducks Well trained by Cats Posts: 31,565 Karma: 62543878 Join Date: Aug 2009 Location: The Central Coast of California Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A	I've never seen a good reason to work with a single, huge HTML file. As Lumpy pointed out, Calibre (and Sigil) both have search (& replace) settings that control the scope. Spell check is one of the few tools that does NOT have that control. But since yo can double click the word in the S C, and it will jump to the next occurance, there is no 'blind faith' correcting there.

Advert

Advert