11-05-2011, 05:07 PM | #1 |
Junior Member
Posts: 4
Karma: 10
Join Date: Nov 2011
Device: Kindle 2nd Gen
|
HTML file doesn't import to ZIP
I'm trying to convert a collection of downloaded HTML pages to an ebook. The pages are downloaded to a directory with corresponding subdirectories for the "complete" portion of each page.
page80.html page81.html page80_files <--- Subdirectory page81_files <--- Subdirectory I have the following "TOC" HTML file in the same main directory: <html> <body> <h1>Table of Contents</h1> <p style="text-indent:0pt"> <a href="./page80.html">80</a><br/> <a href="./page81.html">81</a><br/> </p> </body> </html> When I drag the TOC file into Calibre, it comes in as just an HTML file. No corresponding ZIP file containing the TOC file, the sub pages, and the sub page contents is created. When I open the TOC file in the Calibre viewer and examine the href links, they resolve to a temp directory in C:\users\...appdata ... yada yada yada\page80.html, which isn't where the files are truly located. So, for some reason, Calibre isn't getting the member pages nor the contents of the subdirectories into the zip files. Would an error in the underlying page files' HTML format cause this or any ideas how to find out what the issue is? I've looked through msgs/thread here but haven't seen this type of issue. Thanks. |
11-05-2011, 08:17 PM | #2 |
US Navy, Retired
Posts: 9,864
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Nexus 7
|
Go to Preferences - Plugins - File Type Plugins and make sure the HTML to ZIP plugin is enabled.
|
Advert | |
|
11-06-2011, 10:34 AM | #3 |
Junior Member
Posts: 4
Karma: 10
Join Date: Nov 2011
Device: Kindle 2nd Gen
|
The plug-in is green. I toggled it off to gray / back on to green and deleted my original doc.
When I added it back in, same result. Just an HTML file. This is Calibre portable running from a disk where the docs are stored (which isn't my C drive). The Calibre libray for this portable install is in the default location (on the same disk as the docs and Calibre portable). I also see that 0.8.25 just came out, so I upgraded everything and the behavior didn't change. Lastly, as a test, I saved two pages of this forum using the same "File | Save page as | complete" function in Firefox, and created a corresponding TOC file. That TOC file looks like this: <html> <body> <h3>Table of Contents</h3> <p style="text-indent:0pt"> <a href="./MobiThread1Complete.html">Part One</a><br/> <a href="./MObiThread2Complete.html">Part Two</a><br/> </p> </body> </html> I dragged and dropped this MobiThread TOC file into Calibre and that DID create a ZIP file. So, I'm wondering what the delta's are, since the two TOC files are virtually identical in format. Thanks for the help. |
11-06-2011, 11:25 AM | #4 |
Well trained by Cats
Posts: 29,782
Karma: 54830978
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
@Webdad
by any chance are you NOT running on a case insensitive OS (Windows)? Those 2 example HTML have really mixed up case file names. Case sensitive OS need the file names to match EXACTLY, includes the extension. |
11-06-2011, 09:16 PM | #5 |
Junior Member
Posts: 4
Karma: 10
Join Date: Nov 2011
Device: Kindle 2nd Gen
|
Yeah, that is an artifact of a quick and dirty test. The OS is Windows 7, but I've gone back and checked the case anyway. all match - as strange as they are.
The failing file is at least consistent. |
Advert | |
|
11-06-2011, 09:19 PM | #6 |
creator of calibre
Posts: 43,844
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Run calibre in debug mode (right click the preferences button) and you will get more info about what is going wrong.
|
11-13-2011, 12:03 PM | #7 |
Junior Member
Posts: 4
Karma: 10
Join Date: Nov 2011
Device: Kindle 2nd Gen
|
Thanks for all the assistance and comments.
I ran in debug mode and found that the parser was throwing an error while processing header information. The files have a large embedded CSS along with lots of other code that isn't needed for this conversion (approximately 1200 lines of code/css). So, I found a nice basic data parsing tool and created a simple script to extract out just the text of the page. Everything looks good now. Thanks again |
Tags |
html files |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Where is the .exe file in the zip Calibre2opds file? | chilady1 | Related Tools | 4 | 09-17-2011 11:56 AM |
HTML to ZIP, breadth first | gus.is.here | Conversion | 4 | 09-14-2011 10:18 AM |
Convert HTML to MOBI (HTML recognized as ZIP file) | pdubois | Conversion | 1 | 01-25-2011 12:55 PM |
html file read as zip | Newmarket2 | Calibre | 12 | 01-05-2011 03:17 PM |
Need help with Caliber html to zip? | Csilla | Calibre | 6 | 11-13-2010 05:41 PM |