View Single Post
Old 04-03-2008, 05:06 PM   #1
bob_ninja
Addict
bob_ninja will become famous soon enoughbob_ninja will become famous soon enoughbob_ninja will become famous soon enoughbob_ninja will become famous soon enoughbob_ninja will become famous soon enoughbob_ninja will become famous soon enough
 
Posts: 208
Karma: 582
Join Date: Aug 2006
Device: Zire71
Page Numbers in electronix text formats

This is a continuation from "News from Bookeen about firmware upgrade"
I thought this topic deserves a separate thread.

The original discussion has take a rather negative tone, so I would like to hit a reset button and look at this issue from a more constructive point.

In general I would agree with demands to show page number *IF* the device already has the information as implied by the "Go to ..." (page) command. However, let us examine what exactly is involved in computing page numbers for electronic formats.

In the old days it was simple. As soon as you print physical pages you had numbers by default. You simply assign numbers as you go. Then came SGML, followed by simple HTML and other derivatives like MobiPocket.

The critical difference is that new formats like HTML define layout instructions (<p>age break, line <br>eak, etc.) and not the actual page/diaplay content. In fact, the original HTML before CSS style sheets didn't even define font information. What font to use for <h1>eader? Well user could choose anything. Furthermore as displays have many different dimensions and windows can be resized, the physical page size is really unknown, undefined. Instead, HTML content was designed to "flow" into whatever display container is used.

Thus emerged PDF which decided to nail down every last detail, from fonts to spacing, images, etc. even page dimensions. Once page dimensions is fixed then it is easy to divide content into pages in PDF *REGARDLESS* of display device or window dimensions. Keep in mind what is being lost here. This is a clear compromise being made here. You gain fidelity, consistency while loosing flexibility. In some case, like for smaller screens, the lost flexibility can be painful.

So back to the question of page numbers. The 1st question is what is a page in the 1st place. Well it is the display device, or browser window, whatever is displaying content. Thus page is defined by our display. Today virtually all displays are simply a 2D matrix of pixels. The process of converting a certain format (like HTMK or MobiPocket) into pixels that can be displayed on a physical display (in this case eInk 6" display) is called "rendering". In short, a rendering engine (like MobiPocket renderer in Cybook) reads the content, executes the instructions like <p> and <br> and converts them into pixels according to a set of rules.

So to actually build a page for display you take some HTML content and render it into pixels. Those pixels (image) is an actual page that is ready for display. There are several important characteristics of rendering:
- it consumers significant computing resources
(more for more complex formats and less for simpler ones)
- thus it tends to consume significant energy (battery power)
- also tends to take relatively long time

Typically generated pixels (images) are much bigger than the input content, so they are only used for display and not kept anywhere. For instance, our typical 1 Mb and smaller MobiPocket books produce many magabytes of images when rendered. Hence rendered page images are only generated on demand, in general.

The HTML content is simply a long sequence of bytes, which in turn consist of a mix of words and rendering instructions (like heading, paragraph, line break, etc.) The Cybook computer simply knows that it is display content from byte #23,859 for instance. No page number. It just keeps reading HTML content from this position until Cybook page is filled. Simple.

Say you want to figure out what is the current page number. This is *THE QUESTION*

Well, the simple approach might be to start rendering content from the 1st byte (in memory only, it doesn't actually get displayed) and keep rendering until you reach the current position, byte #23,859. Say we rendered 31 pages to get to this byte position. Thus we learned that the current page number is 31.

The similar process could be used by the "Go to ..." (page) command to select the new byte position to render from. Again start rendering from byte 0 until the page count reaches the desired page, at which point you display the last rendered page.

There are a lot of problems with this approach. If you are trying to get to page 1,000 then the Cybook processor has to do a lot of work, consumer a lot of batter energy and worst of all would take a very long time (maybe minutes even). In computer speak, this approach doesn't scale. It may work for the initial 10-20 pages, but thereafter becomes too slow and uses too much battery power.

Then you go to work trying to figure out faster ways to manage the page numbers. Maybe the computer keeps track of pages as they are rendered and maps each page number to a corresponding byte position. That also doesn't work since I can jump from TOC to a middle of a book without rendering the initial pages (which could be many). And so on.

In effect, the Cybook programmers (as well as others trying to render HTML or MobiPocket) resort to taking shortcuts, or having Cybook make some guess or approximations to determine a page number without doing so much work. The compromise is lost precision (as reported in the other thread) for far less computation (and corresponding less power usage and better spead).

Of course, I have no clue what is the actual algorithm used and how precise it is (not). Some of you expressed interest in a 100% precise page number. Again, you need to step into Cybook maker's shoes.

SIDE NOTE: Before you scream, it safe to say that page numbers are not important to the majority of users that use browsers and other pageless systems every day without complaints. Sure you may be keen to have them 100% correct, but you are a minority. That being said ....

Bookeen is looking at this marginal feature of a low priority and assigning to it the proportional engineering effort. Of course, they have deadlines and limited resources, so it is just common sense that some manager will decide to assign 10 dev-days, for instance. So this single programmer has only 10 days to implement page "features". Thus they decide they can only do a single "Go to ..." (page) command using some less than great approximation for a barely/reasonably functional go to page command. They just don't have the time/resources to do more. Perhaps due to lower priority they decide they'll assign more effort to it for another release.

So the bottom line is they will spend as much time/effort as X units @350E / @$350 buys them. Simple business decision. You cannot spend more money than you earn.

That is why I said that users with higher expectations should get iRex product as it costs about 2x more. Therefore, iRex has more resources to implement page features better.

In summary,
Page numbers is a difficult feature to implement for certain formats. Such small devices like Cybook lack processing power for precise pagination. It will take both software engineering effort and likely more processing power to obtain more/better page features in the future. How far in the future is hard to say.

This feature like any other tends to increase costs and causes corresponding device price increase. Cybook walks a fine line between providing the desired features like pagination and keeping device price low enough for wider appeal.

Bottom line is, give them a break
bob_ninja is offline   Reply With Quote