![]() |
#1 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,149
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Get involved with the html2epub design process
Hi all,
I'm planning to start working on an html2epub converter for calibre in a couple of weeks. I've outlined some of my ideas for it at http://calibre.kovidgoyal.net/wiki/HTMLEPUBConverter If you have some suggestions/ideas/feedback/wishes chime in, either here or on the wiki, and I'll try to acommodate them as I go along. As background, calibre currently uses the html2lrf converter as the backend for conversion of all formats to LRF. html2epub will play a similar role for EPUB. As such it will be the cornerstone of EPUB support in calibre and thus requires careful design, so that maturing it wont take as long as html2lrf took. The initial design goal of html2epub will be to serve as a backend for web2epub and feeds2epub. Next will be lit2epub, mobi2epub, txt2epub and so on. Finally EPUB as an output format will be added to the GUI. |
![]() |
![]() |
![]() |
#2 | |
reader
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 6,977
Karma: 5183568
Join Date: Mar 2006
Location: Mississippi, USA
Device: Kindle 3, Kobo Glo HD
|
Adobe has a "Best Practices Guide" at their Digital Publishing Technology Center. These are actually best practices for using Adobe Digital Editions (rather than ePub in general), but an ePub that performs poorly in DE isn't going to be viable because most handheld devices that run DE won't have an alternative ePub reader. The most significant recommended practice is:
Quote:
I recommend following Adobe's guidelines, and in particular splitting ebooks into chapters. Note that Feedbooks is already dong this. |
|
![]() |
![]() |
Advert | |
|
![]() |
#3 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,149
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Well breaking things up on chapter boundaries should be possible (that's essentially what html2lrf does now) though it would probably mess up CSS/javascript in the original HTML file, so I guess it will be optional.
Sounds like Adobe are making the same mistake with their epub reader that they made with their pdf reader. They're trying to implement too many features. |
![]() |
![]() |
![]() |
#4 | ||
Reticulator of Tharn
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 618
Karma: 400000
Join Date: Jan 2007
Location: EST
Device: Sony PRS-505
|
Quote:
Quote:
|
||
![]() |
![]() |
![]() |
#5 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,149
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
What about if a chapter break happens on a nested tag? As fo adobe, I think it should be possible to write a decent epub viewer that functions on HTML files large than 300K in a device setting. It probably wont support all features, but still I suspect it would be just as usable for 99% of use cases.
|
![]() |
![]() |
Advert | |
|
![]() |
#6 | |
Reticulator of Tharn
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 618
Karma: 400000
Join Date: Jan 2007
Location: EST
Device: Sony PRS-505
|
Duplicating the nesting structure in both files seems sensible to me, maybe with some logic to eliminate empty elements from the document preceding split boundary. I could see this causing the post-split document to begin with an incorrect amount of whitespace if the CSS is just wrong, but can't think of anything else that could go wrong off the top of my head.
Quote:
|
|
![]() |
![]() |
![]() |
#7 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,149
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Actually if you dont care about pretty printing it should be possible to even preserve whitespace.
|
![]() |
![]() |
![]() |
#8 | |
Reticulator of Tharn
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 618
Karma: 400000
Join Date: Jan 2007
Location: EST
Device: Sony PRS-505
|
Quote:
Code:
<div class="book"> ... <div class="chapter"> <h1>Chapter N</h1> </div> ... </div> But that seems like a fairly rare circumstance. |
|
![]() |
![]() |
![]() |
#9 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,149
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Ah I see, yeah, I think we can tolerate that.
|
![]() |
![]() |
![]() |
#10 |
Member
![]() Posts: 7
Karma: 10
Join Date: Jun 2008
Location: Austin, TX
Device: Adobe DE, MS Reader, Mobipocket Reader, Sony 505
|
Hi, Kovid. Still working on getting some funding from my department for this project. The wheels here turn slowly, though I've been told it's forthcoming.
I've got a few links here that might be of use to you in development of html2epub. The first has already been discussed here (re: filesize limits): http://blogs.adobe.com/digitaleditio...igital_ed.html The second, regarding the ePubCheck utility is from the mobileread forum, so you're probably already familiar: https://www.mobileread.com/forums/showthread.php?t=17501 This third is on the Adobe devnet area. Several useful links on this page, but again, you may have already come across this: http://www.adobe.com/devnet/digitalpublishing/ This fourth one is a page with many useful links from the Idiotprogrammer (not to suggest you're an idiot!): http://www.imaginaryplanet.net/weblo...er/?p=83399167 Thanks so much for your great tool, and I can't tell you how excited I am that you're working on adding html2epub support. Let me know if you need assistance beta testing the product as you develop. ---------------------------jp english jpenglish@newsstand.com |
![]() |
![]() |
![]() |
#11 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,149
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Thanks for the links (I was already aware of most of them, but its good to have them consolidated in one place). html2epub will be developed like html2lrf was (i.e. it will rely heavily on input from calibre's user community with regard to bugs and features.
|
![]() |
![]() |
![]() |
#12 |
Member
![]() Posts: 7
Karma: 10
Join Date: Jun 2008
Location: Austin, TX
Device: Adobe DE, MS Reader, Mobipocket Reader, Sony 505
|
More reference on ePUB format
I've spent a few hours today looking at The Daisy Pipeline Project:
http://daisymfc.sourceforge.net/ There are numerous transformers used within the Pipeline tool that you might find of note. One thing mentioned in the reference material for the ePUB Creator script (http://daisymfc.sourceforge.net/doc/...PSCreator.html) that caught my eye: The OPS Specification is quite picky about what kind of text documents can be used within an EPUB. You can only use XHTML 1.1 (not 1.0) documents, or DTBook 2005-2 (not 2005-1 or 2005-3) documents. I thought you might find this of note if you're not already aware. ---------------------------------------------------------jp english |
![]() |
![]() |
![]() |
#13 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,149
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
The daisy pipeline epub creator script is too limited. html2epub will accept (almost arbitrarily bad) HTML and transform it into a valid EPUB document. In particular, it needs to be able to handle HTML downloaded from the internet. But thanks for the pointer
![]() |
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
After K3 Design are you underwhelmed with the KDGX Design & Features? | brecklundin | Amazon Kindle | 20 | 07-30-2010 05:36 PM |
Moderators moderating discussions they're involved in? (and other mod issues) | dmaul1114 | Feedback | 106 | 07-30-2010 04:57 AM |
Sharing an ebook READER with drm involved | jusmee | Astak EZReader | 4 | 03-10-2010 02:39 AM |
And you thought royalty involved a crown | Moejoe | Writers' Corner | 16 | 05-24-2009 09:32 AM |
html2epub and OPF files | ilovejedd | Calibre | 4 | 05-19-2009 03:52 PM |