Quote:
Originally Posted by Jellby
Only as long as it is approximate. How do you measure the "length" of the book? Do you count all the markup? Then different formats have different lengths, and the same book in the same format will have different length depending on how it's coded. Do you count only "visible" text? That's not easy to do, at least not without decoding and rendering the whole book (even if it's only a "dummy" rendering).
|
That's something I hadn't considered, I was thinking of only visible text since that is what people usually consider when trying to locate text. Due to variations in the amount of markup different formats use (as an example, MS Word puts several kb of coding at the beginning of the document, while RTF puts only a few lines of text, and HTML might have as little as a few characters), using visible text might be the only way to provide a roughly consistent indication of the position within an ebook, even with the differences in formatting.
How hard it is to extract the visible text will likely depend on the format. It could also be part of the process when loading an ebook on an ereader, and only needs to be done one time with each ebook.