View Full Version : [Old Thread] mergin split html files with Calibre?


NASCARaddicted
02-07-2011, 08:07 PM
I think someone wrote how this works, and I even managed to do it myself, but I can't remember how, so ...

I downloaded an ebook in epub format. However, I noticed that there are many errors in the ebook. Also, there is no TOC. So I want to correct all this stuff. I know, an epub is just a zip file with a html file inside. But here is the problem: Like in most epubs, the html file is split into multiple small html files.

I once read something like: Calibre can join this small html files into one big html file. When you convert, there is this option about splitting files if they are larger then 260kb. Change this to something like 99999kb. Then you would end up with an epub that has only 1 big html file, which could be easily edited.
I tried to convert from epub, to epub, using this setting, but I am doing something wrong, because, there are still multiple html files after I exported the new ebook.

Can anyone help me (if anyone can understand what I try to do)? Thanks in advance.

ldolse
02-07-2011, 08:31 PM
Calibre can't merge HTML that's already split. Your best bet is to convert to a format that is natively a single file - try text (using Markdown or Textile output), or RTF. If you're using non-ascii characters text is probably a better fit.

Edit your doc from there, then convert back.

DoctorOhh
02-07-2011, 10:51 PM
I downloaded an ebook in epub format. However, I noticed that there are many errors in the ebook. Also, there is no TOC. So I want to correct all this stuff. I know, an epub is just a zip file with a html file inside. But here is the problem: Like in most epubs, the html file is split into multiple small html files.

This isn't a problem if you use Sigil to edit the book and if the headings you wish to include in the TOC are marked with <h1> tags then it will create a TOC on save.

Alternatively you can do as ldolse said converting to text (using Markdown or Textile output), or RTF and save it as html filtered if html is your favorite editing medium.

Good Luck.

tjung
02-08-2011, 12:12 AM
The latest version of Sigil absolutely will merge .html or .xhtml files inside an EPUB container into 1 file. I have used it to edit poorly edited EPUBs where chapters are split across files. So I merge the two files then chapter break the big file at the end of the chapter I was working on. I don't know why you would want to, but you could merge every single .xhtml file into 1 giant file or just combine then resplit the chapters like I am doing.

Sigil comes in Linux, Mac and Windows flavors and it is GPL Open Source.
http://code.google.com/p/sigil/

Diana495
02-08-2011, 12:55 AM
I'd recommend Sigil too, but sometimes when I need to have a single html file, and I'm too lazy to use Sigil, I cheat and use the debug feature in calibre. Convert the epub to mobi, then reconvert the mobi to epub, and use the processed html file.

Archon
02-08-2011, 04:25 PM
Mac OSX, Linux and Unix have a command line utility called 'textutil'.

It can concatenate (combine in sequence) several files into one single file.

I don't know if Windows has a similar program.

I used textutil several times to combine several html files into one file. As long as they are numbered sequentially you can use a wildcard in the concatenate command and it will grab all the files called 'foo1' thru 'foo9' for instance and stitch them together.

You still need to clean up the output but it is fast and just takes one single command.

http://www.unix.com/man-page/All/1/TEXTUTIL/

Archon

Starson17
02-09-2011, 11:47 AM
Calibre can't merge HTML that's already split.
Calibre has a limited ability to merge html files:
http://calibre-ebook.com/user_manual/faq.html#how-do-i-convert-a-collection-of-html-files-in-a-specific-order

It's the fourth item in the FAQ, which has always struck me as kind of an odd importance for what seems to be a relatively uncommon thing to do with Calibre.

ldolse
02-09-2011, 12:10 PM
Calibre has a limited ability to merge html files:
http://calibre-ebook.com/user_manual/faq.html#how-do-i-convert-a-collection-of-html-files-in-a-specific-order

It's the fourth item in the FAQ, which has always struck me as kind of an odd importance for what seems to be a relatively uncommon thing to do with Calibre.

I think that will only merge them in terms of combining them with one OPF/NCX. The individual html/xhtml flows don't get merged together into a single file/flow, which I believe is what this user wants.

I saw someone say in another thread that converting to mobipocket and back will merge the files to a single flow, but keep the majority of the html... Haven't tried it to confirm.

rsiegler
11-30-2011, 04:30 PM
As Archon said, a command line will concatenate all those files pretty fast. In windows, open a cmd window, get to the active directory, and enter something like this:
copy <myebookname part0??.htm> NewFileName
the question marks in the file name get replace by the sequential numbers in your split files. Assuming they are sequentially numbered. If not, you'll have to renumber them manually.
NewFileName now contains all of the "parts" that got matched to the wild card '??' .
You should see a list of the files scroll up the screen.
Good Luck

ElMiko
12-01-2011, 11:15 AM
Check me if I'm wrong, but can't you just zip the contents of the folder with split html files, and then change the extension to epub?

EDIT: Ignore the above. I don't know what I'm talking about. Don't judge me too harshly; I can't read or write...

theducks
12-01-2011, 11:46 AM
Check me if I'm wrong, but can't you just zip the contents of the folder with split html files, and then change the extension to epub?

NO
There are EPUB specific rules on what goes where and how it is stored.

Tweak EPUB and Sigil take care of complying with the rules when they reassemble the package

ldolse
12-01-2011, 12:00 PM
You both got tricked into responding to an old thread - this thread was created before htmlz was implemented - which based on the OP's description (source was an ePub likely created by an old version of Calibre) is now the best solution.

DoctorOhh
12-01-2011, 06:57 PM
Welcome to Mobileread. :)

As Archon said, a command line will concatenate all those files pretty fast.
...
You should see a list of the files scroll up the screen.
Good Luck


Do not open old threads to answer questions.

Calibre is a a quick development project and there have been 40 revisions since the original question in this thread was asked. Your added information, while applicable, has been surpassed by the htmlz feature being added to calibre 32 revisions ago.Glad to have you on board and I hope you continue to assist the community with your willingness to help answer questions where you have the requisite knowledge needed.

NASCARaddicted
12-03-2011, 01:19 PM
You both got tricked into responding to an old thread - this thread was created before htmlz was implemented - which based on the OP's description (source was an ePub likely created by an old version of Calibre) is now the best solution.

As the original poster, I can say: your answer is absolutely correct. I posted this way before htmlz was implented.

I am surprised, that none of the latest few postings mentions htmlz. Back when it was implented, there was a big "thanks" posting for this feature.

So thanks again to all replies, but to anyone who reads this in the future: there is no more need to answer. Maybe a mod should close the thread.