Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 07-20-2008, 03:42 AM   #1
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 26,320
Karma: 5382313
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Get involved with the html2epub design process

Hi all,

I'm planning to start working on an html2epub converter for calibre in a couple of weeks. I've outlined some of my ideas for it at http://calibre.kovidgoyal.net/wiki/HTMLEPUBConverter

If you have some suggestions/ideas/feedback/wishes chime in, either here or on the wiki, and I'll try to acommodate them as I go along.

As background, calibre currently uses the html2lrf converter as the backend for conversion of all formats to LRF. html2epub will play a similar role for EPUB. As such it will be the cornerstone of EPUB support in calibre and thus requires careful design, so that maturing it wont take as long as html2lrf took.

The initial design goal of html2epub will be to serve as a backend for web2epub and feeds2epub. Next will be lit2epub, mobi2epub, txt2epub and so on. Finally EPUB as an output format will be added to the GUI.
kovidgoyal is offline   Reply With Quote
Old 07-20-2008, 09:35 AM   #2
wallcraft
reader
wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.
 
wallcraft's Avatar
 
Posts: 6,979
Karma: 5183568
Join Date: Mar 2006
Location: Mississippi, USA
Device: Kindle 3 and Fire
Adobe has a "Best Practices Guide" at their Digital Publishing Technology Center. These are actually best practices for using Adobe Digital Editions (rather than ePub in general), but an ePub that performs poorly in DE isn't going to be viable because most handheld devices that run DE won't have an alternative ePub reader. The most significant recommended practice is:

Quote:
Use Chapter size chunks (less than 300k in size.)

As noted above the EPUB format supports both XHTML and DTBook content within the EPUB package. The content of the document should be broken up into multiple files. Having a single XHTML document that’s the entire contents of a novel may be technically valid, but that would also mean that the entire document would need to be loaded into memory when the first page gets rendered or when the user opens the table of contents. It’s much better for reader performance, navigation and usability to split the document into chapter or even section size chunks. Typically you’ll want to treat chapters as separate chunks, in some cases, when the chapters are very long, you’ll want to break them up further. Of course the start of each new chunk will start at the top of a rendered page, so you’ll want to split the chunks with that in mind.

Note that Adobe Digital Editions has the following limitations when running on a mobile device;
Image Size: 10MB uncompressed.
XHTML/DTBook file size: 300k uncompressed/100k compressed.

The limits shown above are per asset within the document. Since your books will have many ‘chunks’ or chapter files, the full text can be much longer than the 300k limit. The limit is only a limit on the individual pieces.
It is my impression that a single XHTML file for the entire contents was not only "technically valid" but in fact the industry standard before ePub, but this won't work well with Adobe DE. Splitting documents on chapter boundaries is easy, but if a chapter is larger than 300k splitting it further without introducing a strange page jump may be more difficult.

I recommend following Adobe's guidelines, and in particular splitting ebooks into chapters. Note that Feedbooks is already dong this.
wallcraft is offline   Reply With Quote
Old 07-20-2008, 03:16 PM   #3
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 26,320
Karma: 5382313
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Well breaking things up on chapter boundaries should be possible (that's essentially what html2lrf does now) though it would probably mess up CSS/javascript in the original HTML file, so I guess it will be optional.

Sounds like Adobe are making the same mistake with their epub reader that they made with their pdf reader. They're trying to implement too many features.
kovidgoyal is offline   Reply With Quote
Old 07-22-2008, 09:11 AM   #4
llasram
Reticulator of Tharn
llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.
 
llasram's Avatar
 
Posts: 622
Karma: 400000
Join Date: Jan 2007
Location: EST
Device: Sony PRS-505
Quote:
Originally Posted by kovidgoyal View Post
Well breaking things up on chapter boundaries should be possible (that's essentially what html2lrf does now) though it would probably mess up CSS/javascript in the original HTML file, so I guess it will be optional.
The OPS states that "Reading Systems must not, by default, render the textual content of the script element, and should not execute the script itself." So for Javascript, many epub viewers won't support Javascript anyway. CSS should be assured of working just by carrying the contents of all <style/> tags from the source file into each split output files.

Quote:
Sounds like Adobe are making the same mistake with their epub reader that they made with their pdf reader. They're trying to implement too many features.
How so? Not that they aren't -- I'm just not sure what you're thinking of in particular :-).
llasram is offline   Reply With Quote
Old 07-22-2008, 01:20 PM   #5
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 26,320
Karma: 5382313
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
What about if a chapter break happens on a nested tag? As fo adobe, I think it should be possible to write a decent epub viewer that functions on HTML files large than 300K in a device setting. It probably wont support all features, but still I suspect it would be just as usable for 99% of use cases.
kovidgoyal is offline   Reply With Quote
Old 07-22-2008, 01:56 PM   #6
llasram
Reticulator of Tharn
llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.
 
llasram's Avatar
 
Posts: 622
Karma: 400000
Join Date: Jan 2007
Location: EST
Device: Sony PRS-505
Quote:
Originally Posted by kovidgoyal View Post
What about if a chapter break happens on a nested tag?
Duplicating the nesting structure in both files seems sensible to me, maybe with some logic to eliminate empty elements from the document preceding split boundary. I could see this causing the post-split document to begin with an incorrect amount of whitespace if the CSS is just wrong, but can't think of anything else that could go wrong off the top of my head.

Quote:
As fo adobe, I think it should be possible to write a decent epub viewer that functions on HTML files large than 300K in a device setting. It probably wont support all features, but still I suspect it would be just as usable for 99% of use cases.
Having some easy way of not needing to keep the element tree for an entire book in memory at once seems like a reasonable enough requirement to me, especially on low-memory embedded devices. OTOH, Mobipocket manages with just one "file," and I'm fairly sure markup can cross record boundaries. Hmm.
llasram is offline   Reply With Quote
Old 07-22-2008, 02:05 PM   #7
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 26,320
Karma: 5382313
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Actually if you dont care about pretty printing it should be possible to even preserve whitespace.
kovidgoyal is offline   Reply With Quote
Old 07-22-2008, 03:16 PM   #8
llasram
Reticulator of Tharn
llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.
 
llasram's Avatar
 
Posts: 622
Karma: 400000
Join Date: Jan 2007
Location: EST
Device: Sony PRS-505
Quote:
Originally Posted by kovidgoyal View Post
Actually if you dont care about pretty printing it should be possible to even preserve whitespace.
Oh, I meant in the rendered markup, if there was something like:

Code:
<div class="book">
  ...
  <div class="chapter">
    <h1>Chapter N</h1>
  </div>
  ...
</div>
And the markup-creator had chosen to style .book with a margin-top in order to create leading whitespace. Duplicating the /book/chapter structure in each file would cause each chapter to begin with the .book margin-top amount of margin.

But that seems like a fairly rare circumstance.
llasram is offline   Reply With Quote
Old 07-22-2008, 03:33 PM   #9
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 26,320
Karma: 5382313
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Ah I see, yeah, I think we can tolerate that.
kovidgoyal is offline   Reply With Quote
Old 07-31-2008, 03:00 PM   #10
jpenglish
Member
jpenglish began at the beginning.
 
Posts: 7
Karma: 10
Join Date: Jun 2008
Location: Austin, TX
Device: Adobe DE, MS Reader, Mobipocket Reader, Sony 505
Hi, Kovid. Still working on getting some funding from my department for this project. The wheels here turn slowly, though I've been told it's forthcoming.

I've got a few links here that might be of use to you in development of html2epub.
The first has already been discussed here (re: filesize limits):
http://blogs.adobe.com/digitaleditio...igital_ed.html

The second, regarding the ePubCheck utility is from the mobileread forum, so you're probably already familiar:
http://www.mobileread.com/forums/showthread.php?t=17501

This third is on the Adobe devnet area. Several useful links on this page, but again, you may have already come across this:
http://www.adobe.com/devnet/digitalpublishing/

This fourth one is a page with many useful links from the Idiotprogrammer (not to suggest you're an idiot!):
http://www.imaginaryplanet.net/weblo...er/?p=83399167


Thanks so much for your great tool, and I can't tell you how excited I am that you're working on adding html2epub support. Let me know if you need assistance beta testing the product as you develop.

---------------------------jp english
jpenglish@newsstand.com
jpenglish is offline   Reply With Quote
Old 07-31-2008, 06:30 PM   #11
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 26,320
Karma: 5382313
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Thanks for the links (I was already aware of most of them, but its good to have them consolidated in one place). html2epub will be developed like html2lrf was (i.e. it will rely heavily on input from calibre's user community with regard to bugs and features.
kovidgoyal is offline   Reply With Quote
Old 08-20-2008, 04:52 PM   #12
jpenglish
Member
jpenglish began at the beginning.
 
Posts: 7
Karma: 10
Join Date: Jun 2008
Location: Austin, TX
Device: Adobe DE, MS Reader, Mobipocket Reader, Sony 505
More reference on ePUB format

I've spent a few hours today looking at The Daisy Pipeline Project:
http://daisymfc.sourceforge.net/

There are numerous transformers used within the Pipeline tool that you might find of note. One thing mentioned in the reference material for the ePUB Creator script (http://daisymfc.sourceforge.net/doc/...PSCreator.html) that caught my eye:

The OPS Specification is quite picky about what kind of text documents can be used within an EPUB. You can only use XHTML 1.1 (not 1.0) documents, or DTBook 2005-2 (not 2005-1 or 2005-3) documents.


I thought you might find this of note if you're not already aware.

---------------------------------------------------------jp english
jpenglish is offline   Reply With Quote
Old 08-20-2008, 05:01 PM   #13
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 26,320
Karma: 5382313
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
The daisy pipeline epub creator script is too limited. html2epub will accept (almost arbitrarily bad) HTML and transform it into a valid EPUB document. In particular, it needs to be able to handle HTML downloaded from the internet. But thanks for the pointer
kovidgoyal is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
After K3 Design are you underwhelmed with the KDGX Design & Features? brecklundin Amazon Kindle 20 07-30-2010 06:36 PM
Moderators moderating discussions they're involved in? (and other mod issues) dmaul1114 Feedback 106 07-30-2010 05:57 AM
Sharing an ebook READER with drm involved jusmee Astak EZReader 4 03-10-2010 03:39 AM
And you thought royalty involved a crown Moejoe Writers' Corner 16 05-24-2009 10:32 AM
html2epub and OPF files ilovejedd Calibre 4 05-19-2009 04:52 PM


All times are GMT -4. The time now is 11:12 AM.


MobileRead.com is a privately owned, operated and funded community.