Quote:
Originally Posted by kovidgoyal
@unpogaz emojis are not counted as images unless for some reason the book is using actual images for emojis, they are usually just normal unicode characters and counted as such.
|
Yeah, I know, but I talk about this case, which is common for circumvent the limitations of some ereader.
Code:
<p>I'm happy <img src="smille_emoji.png"/></p>
Also, their is this case wich is very common (is in the scrambled book):
Code:
<p>paragraph paragraph</p>
<div class="center"><img src="ornemental.png"/></div>
<p>paragraph paragraph</p>
In don't very know how the algorithms handled such cases, but using a value of 1000 seems to me to be a bad approach.
I know that give a value of 1000 to img is to simulated a full page that the image can take, but ornemental image are much common, used several time inside a book, wich result to greatly over-estimate the book size. Not to mention the number of examples where the illustrative images do not take up the entire page.
Personally, I much prefer to ignore the images and just treats them like special block that alway return a value of 1 line, and so that we count only the real character. And if the image is a full page, in the vast majorité des cas, is in a specific XHTML file wich is always count at least a 1 page. Yes, this risk to "underestimate" pages count, but it seems to me to underestimate image content than overestimate them.
Maybe give to "get_num_of_significant_chars()" a optional argument for img etc. so we can control which value give to such special block depending of the context. The avantage of the algorithms that we have is we can teak it to have satisfactory and balanced behavior.