View Full Version : Q. for Kovidgoyal or others about Markup Languages

11-07-2007, 06:44 PM
Kovidgoyal (and anyone else with input)

As you have experience with writing convertors for different formats - what would be your preferred 'Master' format for ebooks? I know you are not necessarily the authority on this, but as you write a lot of the easy to use tools for converting to lrf I thought your ideas would be good to hear.

My goal is to create good looking reproductions of pdf books for viewing on screen in a browser which is not too hard to do. (see My problem right now is that even though this looks ok on screen the source html is quite a mess.

I know you mentioned that your tools should be able to parse my sample document, but it might be easier if I build my documents in a format that I know will work well with your code.

I've checked out some of the custom markup languages out there but I think it would be best to use something that is considered a standard.

Thanks for any input you might offer.

11-07-2007, 07:16 PM
Hmm that's a difficult question. I personally use HTML. While using HTML it is important to keep in mind that while you are trying to reproduce the look of a paper book, the fact is that ebooks are not pbooks and sometimes it is necessary to compromise to accommodate the limitations of a reflowable format. A reflowable document will never be as beautiful as a fixed page size document.

Coming to specifics:

1) Use "semantic" HTML as far as possible. For e.g. use the code
<h2 class="chapter_title">Some chapter</h2> instead of just <h2> or even worse a <p> tag.

2) When specifying sizes and positions use % values whenever possible.

3) Use logical font sizes like large, x-large etc instead of actual numerical values. In general the less specific the font information the better, as I feel this is something that should be at least somewhat under the control of the user.

4) Use minimal markup. If some feature needs a ton of markup to accomplish, it may be better to find a alternative representation that while not being absolutely faithful to the original still preserves the meaning.

5) There is the question of metadata. For this at the moment I would recommend just a simple .opf file.

These are what come to mind at the moment. Feel free to ask questions. I took a look at your sample, it does look very nice on the screen. I've attached the resulting LRF from a "default" conversion without using the advanced features of html2lrf (you can view it using the LRF viewer that is part of libprs500, if you dont have a sony reader). As you can see it already looks halfway decent. With a little bit of cleanup of the HTML you should be able to produce a pretty good LRF.

11-07-2007, 07:50 PM
I hope you don't mind my adding my 2 cents. First, kovidgoyal has made some very good suggestions. I would add that it wouldn't be a bad idea to use XHTML instead of regular HTML. This would be more standardized and structured and would allow you to use the markup in other ways (like epub), while still rendering in a web browser.

As to the suggestion to use percentages, also a very good idea. You can also use ems for the same reasons. Ems are sometimes easier and more intuitive when dealing with text size and positioning. In either case, definitely avoid using absolute units like pixels

Using logical font sizes is also a good suggestion. You don't want to limit the human reader to some font size that maybe you can read, but he/she can't. This is another case where using ems will work. For example, a header size specified in ems will scale up along with the base font when the reader increases the font size in their browser or viewer.