Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 09-27-2010, 04:47 PM   #1
Mixx
Zealot
Mixx has a complete set of Star Wars action figures.Mixx has a complete set of Star Wars action figures.Mixx has a complete set of Star Wars action figures.Mixx has a complete set of Star Wars action figures.
 
Posts: 140
Karma: 387
Join Date: Sep 2010
Device: Kindle 3
Complex HTML archive (ZIP), how to convert

Hi,

I got a couple of HTML-ZIP archives (a folder tree in a ZIP file). It seems that while I can directly convert a ZIP-archive if there is only one HTML file in it, it does not work with a more complex zipped HTML directory that contains a bunch of HTML files, subfolders with further HTML files in them, css files and such. When I click "view" a window pops up showing the contents of the ZIP archive.

The browser of course can display it all. Is there any chance I could convert this complex structure into a MOBI file for the Kindle?

With Calibre or some other tool? Manual does not work, the book contains poems, a couple of hundred of them, each in a separate file. I would not mind if the index (indeces actually) all all lost and I end up with a flat file.

Thanks for any help/hints,

Mixx

Last edited by Mixx; 09-27-2010 at 04:50 PM.
Mixx is offline   Reply With Quote
Old 09-27-2010, 04:57 PM   #2
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by Mixx View Post
I got a couple of HTML-ZIP archives (a folder tree in a ZIP file). It seems that while I can directly convert a ZIP-archive if there is only one HTML file in it, it does not work with a more complex zipped HTML directory that contains a bunch of HTML files, subfolders with further HTML files in them, css files and such.
I store "complex zipped HTML" files in Calibre all the time, and have no trouble viewing or converting them. Perhaps you have zipped the directory it's in, not the index.html file and related files in that directory. Calibre needs to see a format it can read inside the zip.

The best way to store is to drag only the index.html file into Calibre. It will follow all the paths and grab all needed files. If it works in your browser by clicking on index.html, it will work in Calibre the same way.

Quote:
When I click "view" a window pops up showing the contents of the ZIP archive.
It sounds like you have Calibre set to pass zip files to the OS, which just opens them with your zip program. If you tell Calibre to handle them, it will open and view the book inside with its internal reader.

Last edited by Starson17; 09-27-2010 at 05:03 PM.
Starson17 is offline   Reply With Quote
Old 09-27-2010, 04:59 PM   #3
Manichean
Wizard
Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!
 
Manichean's Avatar
 
Posts: 3,130
Karma: 80520
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
I have two books that are essentially zipped up HTML files. The structure is that there's one file in the root, everything else is in subfolders. They convert OK (at least, the part up to ePub generation, where it chokes horribly, I suspect that's because the HTML is so full of crap that I lost all hope of correcting it, together with the fact that each of the two books includes some 20.000 files). Did you try just hitting convert and see what happens?
Manichean is offline   Reply With Quote
Old 09-27-2010, 05:24 PM   #4
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 26,433
Karma: 5383257
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
http://calibre-ebook.com/user_manual...specific-order
kovidgoyal is online now   Reply With Quote
Old 09-28-2010, 04:12 AM   #5
Mixx
Zealot
Mixx has a complete set of Star Wars action figures.Mixx has a complete set of Star Wars action figures.Mixx has a complete set of Star Wars action figures.Mixx has a complete set of Star Wars action figures.
 
Posts: 140
Karma: 387
Join Date: Sep 2010
Device: Kindle 3
Thanks for trying to help me, guys. It's just that I am totally new to this and somewhat confused.

In order to eliminate the ZIP issue, I unpacked the whole structure in a folder. When I add now the index.htm file to calibre (is this meant by "dragging"?), the book has no metadata (filename?) a size of zero MB and it is not really there (can't be viewed, again, upon clicking "view", the folder listing pops up).

Under "settings/behavior" I activated the internal viewer for ZIP files now.

If I add the ZIP file to calibre, all is well as far as the database is concerned, but I can not view the book I can see the landing page only, without the embedded image files and the links for navigation are not active.

A conversion to MOBI produces an empty book, except for the cover image.

Can you please advise what I am doing wrong?

Thank you, Mixx

Last edited by Mixx; 09-28-2010 at 04:27 AM. Reason: Discovered the switch for the internal viewer for ZIP files
Mixx is offline   Reply With Quote
Old 09-28-2010, 04:25 AM   #6
speakingtohe
Wizard
speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.
 
Posts: 4,631
Karma: 25186576
Join Date: Apr 2010
Device: sony PRS-T1 and T3, Kobo Mini and Aura HD, Tablet
I have never been able to view html(zip) files from within calibre but I have never had the zip internal viewer enabled before in preferences. Amazingly it seems to work.
Live and learn as they say.

For the other issues did you try adding books from directory, one book per directory?
Or opening main file with word and saving as rtf?
Both of these have worked for me for problem files but yours may be more problematic
speakingtohe is offline   Reply With Quote
Old 09-28-2010, 05:04 AM   #7
Manichean
Wizard
Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!
 
Manichean's Avatar
 
Posts: 3,130
Karma: 80520
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
Quote:
Originally Posted by Mixx View Post
A conversion to MOBI produces an empty book, except for the cover image.

Can you please advise what I am doing wrong?
Well, you should try Kovid's advice. The manual section he linked to has instructions on how to set file order within a zipped HTML book, which, when set, should result in a book that's not empty.
Manichean is offline   Reply With Quote
Old 09-28-2010, 07:47 AM   #8
Mixx
Zealot
Mixx has a complete set of Star Wars action figures.Mixx has a complete set of Star Wars action figures.Mixx has a complete set of Star Wars action figures.Mixx has a complete set of Star Wars action figures.
 
Posts: 140
Karma: 387
Join Date: Sep 2010
Device: Kindle 3
Quote:
Originally Posted by speakingtohe View Post
I have never been able to view html(zip) files from within calibre but I have never had the zip internal viewer enabled before in preferences. Amazingly it seems to work.
Live and learn as they say.

For the other issues did you try adding books from directory, one book per directory?
Or opening main file with word and saving as rtf?
Both of these have worked for me for problem files but yours may be more problematic
Thanks, I did, it does not help.

Mixx
Mixx is offline   Reply With Quote
Old 09-28-2010, 07:50 AM   #9
Mixx
Zealot
Mixx has a complete set of Star Wars action figures.Mixx has a complete set of Star Wars action figures.Mixx has a complete set of Star Wars action figures.Mixx has a complete set of Star Wars action figures.
 
Posts: 140
Karma: 387
Join Date: Sep 2010
Device: Kindle 3
Quote:
Originally Posted by Manichean View Post
Well, you should try Kovid's advice. The manual section he linked to has instructions on how to set file order within a zipped HTML book, which, when set, should result in a book that's not empty.
That is not practical, I am afraid: this is a poetry book with almost 1,000 poems, all in their own html files, in subdirectories (like the year their were written) etc.



Mixx
Mixx is offline   Reply With Quote
Old 09-28-2010, 07:53 AM   #10
Manichean
Wizard
Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!
 
Manichean's Avatar
 
Posts: 3,130
Karma: 80520
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
Quote:
Originally Posted by Mixx View Post
That is not practical, I am afraid: this is a poetry book with almost 1,000 poems, all in their own html files, in subdirectories (like the year their were written) etc.



Mixx
That's nothing a few lines of $preferred_programming_language shouldn't be able to handle. I'll have to write something for my case, anyhow, when I'm done with it, I'll share the code.
Manichean is offline   Reply With Quote
Old 09-28-2010, 01:29 PM   #11
Manichean
Wizard
Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!
 
Manichean's Avatar
 
Posts: 3,130
Karma: 80520
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
Sooo... I just set $preferred_programming_language = "Python" and hacked away. This is the result. Sadly, I couldn't properly test ist, because even with just about 280 out of the original ~20.000 files, Calibre chokes and dies horribly while creating the ePub output. But, at least it should create a HTML file as specified in the link Kovid provided.
The script expects the four files "header", "footer", "prefix" and "postfix" to be in the same directory as the HTML files. (Thus, to use, you need to extract your ZIP to a directory and put the files I attached inside the same directory.) The four files I mentioned contain the beginning and the end of the HTML file together with the text that's prepended and appended to the individual TOC entries. The script expects two parameters, the first one being the filename of the output file, the second one is the filename of the index file, which will go on top in the TOC. If you don't want to use an index file, just misspell the file name, the script shouldn't care.

Please be aware that this is somewhat kludged together (For you Python savvy folks out there, please don't hit me!) and may or may not work. There's no graceful error handling, scratch that: there's no error handling, so horrible, horrible things may or may not happen.
Attached Files
File Type: zip create_toc.zip (1.4 KB, 81 views)
Manichean is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
HTML converted to ZIP? eosrose Calibre 5 08-21-2010 10:22 PM
DR800/DR1000 Website archive browser (website in .ZIP file) luite iRex 44 08-14-2010 01:52 AM
Convert from HTML (zip) no longer working alhscw Calibre 2 08-03-2010 02:07 PM
[Mobi output] convert complex documents deadland Calibre 2 03-02-2010 02:47 PM
HTML converts to ZIP? Deejub44 Calibre 2 01-24-2009 09:57 PM


All times are GMT -4. The time now is 12:54 AM.


MobileRead.com is a privately owned, operated and funded community.