02-13-2011, 05:26 PM | #1 |
Junior Member
Posts: 3
Karma: 10
Join Date: Oct 2010
Device: none
|
Issues converting Web HTML docs to ebooks
Hi,
I'm trying to convert a couple of real page-turners: 1) the Python Library reference and 2) PostgreSQL online documentation to Kindle eBooks. The Python docs start as restructured text & are converted by Sphinx to HTML. The HTML source is available at http://docs.python.org/archives/pyth...s-html.tar.bz2. I added library/index.html as an eBook and converted this for my Kindle. The Postgres docs start as sgml & are converted by jade to HTML. The HTML source is available at ftp://ftp9.us.postgresql.org/pub/mir...-9.0.3.tar.bz2. I added doc/src/sgml/html/index.html as an ebook and converted that. The conversions were suboptimal in two different ways. The postgres book was out of order -- the appendixes appeared first & html footers were on each page. The python library still had all (at least most) html tags inplace & you were reading html on the kindle. Any thoughts on converting to nice ebooks with TOC, etc. Obviously, reading the SGML or RsT source (before HTML encoding) would be best. Any other thoughts? Thanks. Kent |
02-13-2011, 09:03 PM | #2 |
Wears funny hat (cloth)
Posts: 28
Karma: 26
Join Date: Dec 2010
Location: Limbo
Device: Kobo WiFi, Kobo Touch
|
"Add book" turning HTML into ZIP rather than EPUB
I've been pasting longer HTML docs into Word, creating a TOC, then saving as Filtered HTML, then using Calibre Add-Book to create EPUBs for reading on my Kobo WiFi. Suddenly, perhaps with Calibre 0.7.43 or .44, all the HTMLs get added as ZIPs. That requires an extra conversion, ZIP to EPUB, and litters directories with extra files. I found that even a one-paragraph HTML doc gets ZIPped when added by Calibre.
Any way to have Calibre Add-Book default to EPUB? I am not sophisticated with HTML and really only know how to use Word to create simple HTML. Calibre is otherwise terrific and I'm donating $50 right now! |
Advert | |
|
02-13-2011, 10:46 PM | #3 |
Wizard
Posts: 4,552
Karma: 950151
Join Date: Nov 2008
Device: Sony PRS-950, iphone/ipad (Marvin/iBooks/QuickReader)
|
I am confused - adding a book to Calibre does not convert the format - merely stores the format you add. In addition when you add an HTML format file, Calibre has stored it (and all linked files) a a ZIP file for as long as I can remember.
Once you have added the HTML (zip) file it can be converted to the format of your choce. This can be done explicitly after the add, or implicitly as part of the transfer of files to the reader device. It is not (and never has been) done as part of the add process. |
02-13-2011, 11:54 PM | #4 |
Wears funny hat (cloth)
Posts: 28
Karma: 26
Join Date: Dec 2010
Location: Limbo
Device: Kobo WiFi, Kobo Touch
|
My bad; I was addled. I had used 2epub.com to convert some HTML docs into EPUBs which I then added to Calibre. A few days later I had completely forgotten that intermediate step.
BTW 2epub.com is convenient, quick, reliable. Maybe Calibre should incorporate that function. |
02-13-2011, 11:55 PM | #5 |
creator of calibre
Posts: 43,842
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Umm you do know that 2epub.com uses calibre to do the actual conversions dont you?
|
Advert | |
|
02-14-2011, 01:36 AM | #6 |
Wears funny hat (cloth)
Posts: 28
Karma: 26
Join Date: Dec 2010
Location: Limbo
Device: Kobo WiFi, Kobo Touch
|
I do now and I guess I'd better say no more!
|
02-14-2011, 01:39 AM | #7 |
Junior Member
Posts: 1
Karma: 10
Join Date: Feb 2011
Device: Droid, iTouch, Dell Latitude XP
|
So it reads html, packs them in zip files, tells you it just added a zip file, but then if you continue and try to convert it it actually does produce an epub file. I just love the little things that don't get mentioned in the manual because after all EVERYBODY knows about them don't they? Queep - thanks folks, was going nuts but you've solved the problem I just registered with this thing to answer.
|
02-14-2011, 04:01 AM | #8 |
Junior Member
Posts: 3
Karma: 10
Join Date: Oct 2010
Device: none
|
Hi guys,
Thanks for the thoughts. Anything specific I should try to get a web HTML doc into a nice ebook format? I thought that using the Python Library Reference might be helpful as kovidgoyal is obviously quite familiar with Python/RsT/Sphinx and knows what magic would need to happen, or if it's a road not yet ready to be taken. Please keep the thoughts coming. This is one problem I've not been able to solve & I'd appreciate any help I could get. Kent Last edited by kenth; 02-14-2011 at 04:13 AM. Reason: typo |
02-16-2011, 05:10 AM | #9 | |
Zealot
Posts: 110
Karma: 5176
Join Date: Dec 2010
Device: Mac OSX, iPad, iPod, & Nook
|
"It's nice to be nice to the nice" - Frank Burns M*A*S*H
Quote:
So, on ideas to make it a "nice" ebook, I would think you need to take the site and flatten it manually so that instead of having an exploded structure it is linear like a normal book. If the html files are all sequentially numbered in the order you want them in, like pages in a book, you can easily stitch them together as one file and then add chapter markers and a TOC on the first page. Or, their filenames can be adjusted according to an index you create or modify and then possibly loaded into calibre by adding that index instead. I would use textutil to concatenate smaller files into a large file for each chapter or the whole book depending on how large it is. Then import it into calibre or add it to your Kindle. Textutil is a unix tool that is found on Linux and Mac OSX also. If you are using Windows I am sure there is some utility that does the same thing. http://www.unix.com/man-page/All/1/TEXTUTIL/ http://oreilly.com/pub/a/mac/2005/11/22/cli-tools.html Happy Wednesday Archon Last edited by Archon; 02-16-2011 at 05:14 AM. |
|
03-03-2011, 03:28 PM | #10 |
Junior Member
Posts: 3
Karma: 10
Join Date: Oct 2010
Device: none
|
Followup with Resolutions
I successfully (well sort-of) loaded these ebooks onto my Kindle by downloading the original documentation source code & going from there.
For the PostgreSQL documentation, I built the documentation as a single HTML file & successfully converted this. Since building the docs involved installing lots of tools, I just created a new FreeBSD virtual machine for the purpose, built my HTML file & called it a day. Looks nice on the Kindle. For Python standard library, it was a bit more difficult. The version of Sphinx used for the 2.x documentation is old & doesn't support the singlehtml nor epub output formats. However the 3.x docs use Sphinx 1.07, so I punted on 2.7 docs, & just did a "make epub" in the 3.2 doc subdirectory. I loaded the resulting into calibre with no problem. All in all, not so bad. The new Sphinx output formats (as of v1.0 I believe) make conversion a piece of cake. Hopefully this is the direction the world is moving. Kent |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
bookmark issues converting HTML to EPUB | isabellkirsten | Calibre | 0 | 04-09-2010 11:47 PM |
issues with Technical PDF docs (equations; matrice...) | tristouille | Calibre | 1 | 01-27-2010 07:52 AM |
Converting PDF tech docs | shunyun | Amazon Kindle | 10 | 01-22-2010 06:41 PM |
Problems converting Word docs | ficbot | Sony Reader | 4 | 05-15-2009 07:36 PM |
Converting Docs for Iphone 3G | mjhudston | Apple Devices | 10 | 04-15-2009 03:06 AM |