09-27-2010, 03:47 PM | #1 |
Zealot
Posts: 143
Karma: 387
Join Date: Sep 2010
Device: Kindle 3
|
Complex HTML archive (ZIP), how to convert
Hi,
I got a couple of HTML-ZIP archives (a folder tree in a ZIP file). It seems that while I can directly convert a ZIP-archive if there is only one HTML file in it, it does not work with a more complex zipped HTML directory that contains a bunch of HTML files, subfolders with further HTML files in them, css files and such. When I click "view" a window pops up showing the contents of the ZIP archive. The browser of course can display it all. Is there any chance I could convert this complex structure into a MOBI file for the Kindle? With Calibre or some other tool? Manual does not work, the book contains poems, a couple of hundred of them, each in a separate file. I would not mind if the index (indeces actually) all all lost and I end up with a flat file. Thanks for any help/hints, Mixx Last edited by Mixx; 09-27-2010 at 03:50 PM. |
09-27-2010, 03:57 PM | #2 | ||
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
The best way to store is to drag only the index.html file into Calibre. It will follow all the paths and grab all needed files. If it works in your browser by clicking on index.html, it will work in Calibre the same way. Quote:
Last edited by Starson17; 09-27-2010 at 04:03 PM. |
||
Advert | |
|
09-27-2010, 03:59 PM | #3 |
Wizard
Posts: 3,130
Karma: 91256
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
|
I have two books that are essentially zipped up HTML files. The structure is that there's one file in the root, everything else is in subfolders. They convert OK (at least, the part up to ePub generation, where it chokes horribly, I suspect that's because the HTML is so full of crap that I lost all hope of correcting it, together with the fact that each of the two books includes some 20.000 files). Did you try just hitting convert and see what happens?
|
09-27-2010, 04:24 PM | #4 |
creator of calibre
Posts: 43,844
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
|
09-28-2010, 03:12 AM | #5 |
Zealot
Posts: 143
Karma: 387
Join Date: Sep 2010
Device: Kindle 3
|
Thanks for trying to help me, guys. It's just that I am totally new to this and somewhat confused.
In order to eliminate the ZIP issue, I unpacked the whole structure in a folder. When I add now the index.htm file to calibre (is this meant by "dragging"?), the book has no metadata (filename?) a size of zero MB and it is not really there (can't be viewed, again, upon clicking "view", the folder listing pops up). Under "settings/behavior" I activated the internal viewer for ZIP files now. If I add the ZIP file to calibre, all is well as far as the database is concerned, but I can not view the book I can see the landing page only, without the embedded image files and the links for navigation are not active. A conversion to MOBI produces an empty book, except for the cover image. Can you please advise what I am doing wrong? Thank you, Mixx Last edited by Mixx; 09-28-2010 at 03:27 AM. Reason: Discovered the switch for the internal viewer for ZIP files |
Advert | |
|
09-28-2010, 03:25 AM | #6 |
Wizard
Posts: 4,812
Karma: 26912940
Join Date: Apr 2010
Device: sony PRS-T1 and T3, Kobo Mini and Aura HD, Tablet
|
I have never been able to view html(zip) files from within calibre but I have never had the zip internal viewer enabled before in preferences. Amazingly it seems to work.
Live and learn as they say. For the other issues did you try adding books from directory, one book per directory? Or opening main file with word and saving as rtf? Both of these have worked for me for problem files but yours may be more problematic |
09-28-2010, 04:04 AM | #7 |
Wizard
Posts: 3,130
Karma: 91256
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
|
Well, you should try Kovid's advice. The manual section he linked to has instructions on how to set file order within a zipped HTML book, which, when set, should result in a book that's not empty.
|
09-28-2010, 06:47 AM | #8 | |
Zealot
Posts: 143
Karma: 387
Join Date: Sep 2010
Device: Kindle 3
|
Quote:
Mixx |
|
09-28-2010, 06:50 AM | #9 | |
Zealot
Posts: 143
Karma: 387
Join Date: Sep 2010
Device: Kindle 3
|
Quote:
Mixx |
|
09-28-2010, 06:53 AM | #10 |
Wizard
Posts: 3,130
Karma: 91256
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
|
That's nothing a few lines of $preferred_programming_language shouldn't be able to handle. I'll have to write something for my case, anyhow, when I'm done with it, I'll share the code.
|
09-28-2010, 12:29 PM | #11 |
Wizard
Posts: 3,130
Karma: 91256
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
|
Sooo... I just set $preferred_programming_language = "Python" and hacked away. This is the result. Sadly, I couldn't properly test ist, because even with just about 280 out of the original ~20.000 files, Calibre chokes and dies horribly while creating the ePub output. But, at least it should create a HTML file as specified in the link Kovid provided.
The script expects the four files "header", "footer", "prefix" and "postfix" to be in the same directory as the HTML files. (Thus, to use, you need to extract your ZIP to a directory and put the files I attached inside the same directory.) The four files I mentioned contain the beginning and the end of the HTML file together with the text that's prepended and appended to the individual TOC entries. The script expects two parameters, the first one being the filename of the output file, the second one is the filename of the index file, which will go on top in the TOC. If you don't want to use an index file, just misspell the file name, the script shouldn't care. Please be aware that this is somewhat kludged together (For you Python savvy folks out there, please don't hit me!) and may or may not work. There's no graceful error handling, scratch that: there's no error handling, so horrible, horrible things may or may not happen. |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
HTML converted to ZIP? | eosrose | Calibre | 5 | 08-21-2010 09:22 PM |
DR800/DR1000 Website archive browser (website in .ZIP file) | luite | iRex | 44 | 08-14-2010 12:52 AM |
Convert from HTML (zip) no longer working | alhscw | Calibre | 2 | 08-03-2010 01:07 PM |
[Mobi output] convert complex documents | deadland | Calibre | 2 | 03-02-2010 01:47 PM |
HTML converts to ZIP? | Deejub44 | Calibre | 2 | 01-24-2009 08:57 PM |