extracting html file

beccaa · 06-22-2016, 08:16 AM

I have a bit of a problem that I'm really hoping someone can help with! I lost my original html file for my ebook. I have the .mobi and .ebub files within Calibre and would love to know whether there's any way that I can extract the full original html file from either of those, or perhaps elsewhere in Calibre.

I have had a look in the library and see the files there but when I go to click and open the .zip it just opens another zip, which opens another zip...

Would REALLY appreciate any help. Thank you!

BetterRed · 06-22-2016, 09:42 AM

This is for EPUB.

Select Book in Book List, press 'U' - that will take you to the Unpack facility, it unpacks to a temporary folder, and opens it in your file manager, from there you can copy the html, css etc files.

An EPUB is a zip, so you could copy the EPUB somewhere, rename to .zip and open in with your everyday unzip utility.

Some 'MOBi's' can be unpacked but I've never needed or wanted to do that so I'm not sure about doing it. There's a KindleUnpack plugin.

BR

faltradl · 06-22-2016, 09:45 AM

Use the built-in Ebookeditor of Calibre to open the epup. Then you see all containing files.

To look in the library ist es very dangerous way. Looking ist alowed, but changing never ever.

beccaa · 06-22-2016, 10:08 AM

Quote:

Originally Posted by BetterRed

This is for EPUB.

Select Book in Book List, press 'U' - that will take you to the Unpack facility, it unpacks to a temporary folder, and opens it in your file manager, from there you can copy the html, css etc files.

An EPUB is a zip, so you could copy the EPUB somewhere, rename to .zip and open in with your everyday unzip utility.

Some 'MOBi's' can be unpacked but I've never needed or wanted to do that so I'm not sure about doing it. There's a KindleUnpack plugin.

BR

Thanks so much, that's really helpful!

This splits the files into multiple html files though, and I had them within one file originally. I'm not really confident enough to rebuild it from these.

Is there any way of extracting the original do you know, or has that gone for good?

Thanks again!!

beccaa · 06-22-2016, 10:09 AM

Quote:

Originally Posted by faltradl

Use the built-in Ebookeditor of Calibre to open the epup. Then you see all containing files.

To look in the library ist es very dangerous way. Looking ist alowed, but changing never ever.

Thank you for your help! I had looked there but would like a bit more control within my ebook editor external to Calibre...

theducks · 06-22-2016, 12:16 PM

Quote:

Originally Posted by beccaa

Thanks so much, that's really helpful!

This splits the files into multiple html files though, and I had them within one file originally. I'm not really confident enough to rebuild it from these.

Is there any way of extracting the original do you know, or has that gone for good?

Thanks again!!

Just a technical quibble:
The split must have been done sometime earlier. Unpack OR the Editor, just work on what is there.

1) If you use the Editor, you can work on any piece and saving the 'edit' puts them back into the 'book' in the order of the filelist

2)As long as you leave the Unpack session active, you can click the 'Rebuild" button after editing/replacing pieces. keep the same names unless you also manually correct the OPF (and NCX)

beccaa · 06-22-2016, 12:48 PM

Quote:

Originally Posted by theducks

Just a technical quibble:
The split must have been done sometime earlier. Unpack OR the Editor, just work on what is there.

1) If you use the Editor, you can work on any piece and saving the 'edit' puts them back into the 'book' in the order of the filelist

2)As long as you leave the Unpack session active, you can click the 'Rebuild" button after editing/replacing pieces. keep the same names unless you also manually correct the OPF (and NCX)

Thanks for your very charming response! I do realise I could edit them all individually, but as I said I am actually hoping to extract the original html file, which was uploaded as a single file into calibre. Shall I take it that isn't possible?

theducks · 06-22-2016, 12:59 PM

Quote:

Originally Posted by beccaa

Thanks for your very charming response! I do realise I could edit them all individually, but as I said I am actually hoping to extract the original html file, which was uploaded as a single file into calibre. Shall I take it that isn't possible?

If the HTML file was a 'single', it would have been inmported as the Zip or RAR format.

Just export the file and use WinRAR or similar to extract the archive

jackie_w · 06-22-2016, 01:36 PM

@beccaa,

I don't know whether this would fit your requirements, but have you tried converting the epub (or mobi or whatever) to calibre's HTMLZ format? HTMLZ is a standard zip file containing one big HTML file (plus images etc).

However, be aware that the conversion may introduce changes to original HTML/CSS class names.

dwig · 06-22-2016, 01:45 PM

Quote:

Originally Posted by beccaa

... I lost my original html file for my ebook. I have the .mobi and .ebub files within Calibre and would love to know whether there's any way that I can extract the full original html file from either of those, or perhaps elsewhere in Calibre.
...

Not likely.

The original HTML doesn't not exist in the .MOBI. It would have been altered by the conversion process. Anything that you unpack from the .MOBI, if possible, would be this altered HTML or xHTML file.

The ePUB might contain the original HTML, but it is quite unlikely that it does. If the ePUB was created by a conversion to ePUB process then the HTML or xHTML in the ePUB would be somewhat different from the original.

As stated in another post, if (and that's a big IF) the original HTML was added to calibre before conversion to the other formats it would normally be wrapped up as a ZIP archive when added to the library. That ZIP would contain the original HTML, but no product of a conversion to another format (e.g. ePUB, MOBI, AZW3, ...) would contain an exact copy of the original HTML. Even when the target format for a conversion contains HTML file(s) the HTML code will have been altered by the conversion process so that it complies with the target format's limitations.

beccaa · 06-22-2016, 08:03 PM

Thanks everyone. I tried some of your suggestions but as you say, the files were altered. Went back to an old version and made some changes manually. I say some changes - like, all night long!

Really appreciate your help and advice though - thank you

!

06-22-2016, 08:16 AM	#1
beccaa Junior Member Posts: 5 Karma: 10 Join Date: Jun 2016 Device: kindle	extracting html file I have a bit of a problem that I'm really hoping someone can help with! I lost my original html file for my ebook. I have the .mobi and .ebub files within Calibre and would love to know whether there's any way that I can extract the full original html file from either of those, or perhaps elsewhere in Calibre. I have had a look in the library and see the files there but when I go to click and open the .zip it just opens another zip, which opens another zip... Would REALLY appreciate any help. Thank you!

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Help with extracting pdf file	S411il	General Discussions	13	01-28-2014 06:48 PM
Extracting html from Mobi on OSX	mGorilla	Kindle Formats	6	05-10-2011 06:00 AM
Extracting a cover image from lit file	p3aul	Calibre	6	07-25-2010 05:33 PM
Extracting firmware bin file	adreamer	Ectaco jetBook	1	01-02-2010 02:38 PM
Extracting html/images from within .imp files!	nrapallo	IMP	12	03-10-2009 11:22 PM

06-22-2016, 09:42 AM	#2
BetterRed null operator (he/him) Posts: 22,055 Karma: 30277960 Join Date: Mar 2012 Location: Sydney Australia Device: none	This is for EPUB. Select Book in Book List, press 'U' - that will take you to the Unpack facility, it unpacks to a temporary folder, and opens it in your file manager, from there you can copy the html, css etc files. An EPUB is a zip, so you could copy the EPUB somewhere, rename to .zip and open in with your everyday unzip utility. Some 'MOBi's' can be unpacked but I've never needed or wanted to do that so I'm not sure about doing it. There's a KindleUnpack plugin. BR

06-22-2016, 09:45 AM	#3
faltradl Guru Posts: 602 Karma: 1712372 Join Date: Feb 2013 Location: germany Device: PocketBook Touch	Use the built-in Ebookeditor of Calibre to open the epup. Then you see all containing files. To look in the library ist es very dangerous way. Looking ist alowed, but changing never ever.

06-22-2016, 01:36 PM	#9
jackie_w Grand Sorcerer Posts: 6,274 Karma: 16800000 Join Date: Sep 2009 Location: UK Device: ClaraHD, Forma, Libra2, Clara2E, LibraCol, PBTouchHD3	@beccaa, I don't know whether this would fit your requirements, but have you tried converting the epub (or mobi or whatever) to calibre's HTMLZ format? HTMLZ is a standard zip file containing one big HTML file (plus images etc). However, be aware that the conversion may introduce changes to original HTML/CSS class names.

06-22-2016, 08:03 PM	#11
beccaa Junior Member Posts: 5 Karma: 10 Join Date: Jun 2016 Device: kindle	Thanks everyone. I tried some of your suggestions but as you say, the files were altered. Went back to an old version and made some changes manually. I say some changes - like, all night long! Really appreciate your help and advice though - thank you !

Advert

Advert