Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 11-05-2011, 05:07 PM   #1
webdad
Junior Member
webdad began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Nov 2011
Device: Kindle 2nd Gen
Question HTML file doesn't import to ZIP

I'm trying to convert a collection of downloaded HTML pages to an ebook. The pages are downloaded to a directory with corresponding subdirectories for the "complete" portion of each page.

page80.html
page81.html
page80_files <--- Subdirectory
page81_files <--- Subdirectory

I have the following "TOC" HTML file in the same main directory:

<html>
<body>
<h1>Table of Contents</h1>
<p style="text-indent:0pt">
<a href="./page80.html">80</a><br/>
<a href="./page81.html">81</a><br/>
</p>
</body>
</html>

When I drag the TOC file into Calibre, it comes in as just an HTML file. No corresponding ZIP file containing the TOC file, the sub pages, and the sub page contents is created.

When I open the TOC file in the Calibre viewer and examine the href links, they resolve to a temp directory in C:\users\...appdata ... yada yada yada\page80.html, which isn't where the files are truly located.

So, for some reason, Calibre isn't getting the member pages nor the contents of the subdirectories into the zip files. Would an error in the underlying page files' HTML format cause this or any ideas how to find out what the issue is?

I've looked through msgs/thread here but haven't seen this type of issue.

Thanks.
webdad is offline   Reply With Quote
Old 11-05-2011, 08:17 PM   #2
DoctorOhh
US Navy, Retired
DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.
 
DoctorOhh's Avatar
 
Posts: 9,864
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Nexus 7
Quote:
Originally Posted by webdad View Post
So, for some reason, Calibre isn't getting the member pages nor the contents of the subdirectories into the zip files. Would an error in the underlying page files' HTML format cause this or any ideas how to find out what the issue is?
Go to Preferences - Plugins - File Type Plugins and make sure the HTML to ZIP plugin is enabled.
DoctorOhh is offline   Reply With Quote
Advert
Old 11-06-2011, 10:34 AM   #3
webdad
Junior Member
webdad began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Nov 2011
Device: Kindle 2nd Gen
The plug-in is green. I toggled it off to gray / back on to green and deleted my original doc.

When I added it back in, same result. Just an HTML file.

This is Calibre portable running from a disk where the docs are stored (which isn't my C drive). The Calibre libray for this portable install is in the default location (on the same disk as the docs and Calibre portable).

I also see that 0.8.25 just came out, so I upgraded everything and the behavior didn't change.

Lastly, as a test, I saved two pages of this forum using the same "File | Save page as | complete" function in Firefox, and created a corresponding TOC file.

That TOC file looks like this:
<html>
<body>
<h3>Table of Contents</h3>
<p style="text-indent:0pt">
<a href="./MobiThread1Complete.html">Part One</a><br/>
<a href="./MObiThread2Complete.html">Part Two</a><br/>
</p>
</body>
</html>

I dragged and dropped this MobiThread TOC file into Calibre and that DID create a ZIP file.

So, I'm wondering what the delta's are, since the two TOC files are virtually identical in format.

Thanks for the help.
webdad is offline   Reply With Quote
Old 11-06-2011, 11:25 AM   #4
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 29,782
Karma: 54830978
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
@Webdad

by any chance are you NOT running on a case insensitive OS (Windows)?
Those 2 example HTML have really mixed up case file names.
Case sensitive OS need the file names to match EXACTLY, includes the extension.
theducks is offline   Reply With Quote
Old 11-06-2011, 09:16 PM   #5
webdad
Junior Member
webdad began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Nov 2011
Device: Kindle 2nd Gen
Yeah, that is an artifact of a quick and dirty test. The OS is Windows 7, but I've gone back and checked the case anyway. all match - as strange as they are.

The failing file is at least consistent.
webdad is offline   Reply With Quote
Advert
Old 11-06-2011, 09:19 PM   #6
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,844
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Run calibre in debug mode (right click the preferences button) and you will get more info about what is going wrong.
kovidgoyal is offline   Reply With Quote
Old 11-13-2011, 12:03 PM   #7
webdad
Junior Member
webdad began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Nov 2011
Device: Kindle 2nd Gen
Thanks for all the assistance and comments.

I ran in debug mode and found that the parser was throwing an error while processing header information. The files have a large embedded CSS along with lots of other code that isn't needed for this conversion (approximately 1200 lines of code/css).

So, I found a nice basic data parsing tool and created a simple script to extract out just the text of the page.

Everything looks good now.

Thanks again
webdad is offline   Reply With Quote
Reply

Tags
html files


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Where is the .exe file in the zip Calibre2opds file? chilady1 Related Tools 4 09-17-2011 11:56 AM
HTML to ZIP, breadth first gus.is.here Conversion 4 09-14-2011 10:18 AM
Convert HTML to MOBI (HTML recognized as ZIP file) pdubois Conversion 1 01-25-2011 12:55 PM
html file read as zip Newmarket2 Calibre 12 01-05-2011 03:17 PM
Need help with Caliber html to zip? Csilla Calibre 6 11-13-2010 05:41 PM


All times are GMT -4. The time now is 08:20 PM.


MobileRead.com is a privately owned, operated and funded community.