View Single Post
Old 04-05-2016, 06:37 AM   #166
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
Edit Note: Turns out in the few hours I had the tab open and typed up this post, a lot more posts were made. I will have to read/absorb/incorporate those in a later post. Sorry if I missed you or your insights.

Quote:
Originally Posted by JSWolf View Post
Sorry, but page numbers do make sense. Have you not been reading about ADE page numbers? ADE page numbers are exactly the same with the same eBook on any screen size, font size, margins, font, line-height, etc. What page numbers are you talking about?
I suspect a lot of the wording is getting confused because of multiple simultaneous usages of the word "page".

Many people use the word page to refer to "physical pages", while others mean "screens" (as in what text shows up on the screen of their device), and others are using it to mean the auto-calculated number of "pages" by ADE/Kindle.

In this case, MikeB1972 clearly meant the ADE-case.

Quote:
Originally Posted by MikeB1972 View Post
If you are switching between ADE readers and know the page number you were on, it's close enough to manually synch the readers. Probably useful enough if you have multiple readers from different manufacturers (and read the same book on multiple devices).

Paragraph number would of course be handier
I wrote about this in the forums before, but in my experience, I found a search for 4 words can typically lead you to an exact location anywhere within a book. I find this to be a much more "neutral" way of searching across different devices/formats of the same book.

A few times I was told by professors that the 3- or 4-word method is unwieldy, but I don't see much difference between saying:

"Turn to page 123" and "Turn to page 123, where he says, 'the wind blew westward'."

Students reading (that specific version) the physical book spend time turning to 123, and the digital students search for "the wind blew" or "the wind blew westward". Everyone reaches the same spot whether they have Physical/EPUB/MOBI/PDF/HTML/whatever!

Quote:
Originally Posted by MikeB1972 View Post
Except with a paper edition you can track down an old version, that isn't possible with ebooks.
I agree. Typically you can only get your hands on the "latest and greatest" version of the ebook. Typically Amazon/B&N/whoever is trying to make these things so "user-friendly", and they don't offer any sort of method to even get to the older versions.

Ebooks typically go through a handful of minor versions (such as fixing a few typos, fixing a broken link, fixing CSS, compressing images, [...]), medium overhauls (inserting higher quality images), or they may go through major overhauls (cleaning up a ton of absolutely crappy code... which is what I do in many cases).

To my knowledge, specific versions of ebooks are not marked in any sort of standard way, and there doesn't seem to be any sort of hashed database of different versions of ebooks (and heaven forbid the big publishers would ever pull something like this. This isn't even bringing into the discussion of DRM mucking up hashing systems).

Even between stores, there may also even be tiny variations and differences between the same formats (Smashwords might run it through their "Meatgrinder", B&N might strip out a few CSS for no good reason, Amazon might strip out fonts, iTunes may need a tiny xml file for enabling fonts, maybe some other store does a Calibre conversion of your book, one store needs a cover of ###px x ###px, [...]).

In my experience, there are definitely many more minor variations of ebooks than their physical counterparts.

Side Note: I have even seen books go backwards in quality in later versions (in the case of "A Song of Ice and Fire" EPUBs that I purchased, the image quality became lower in a later version of the EPUB). I considered this a large downgrade (although they did fix a few typos + fix the CSS).

Quote:
Originally Posted by issybird View Post
I'm in the very useful camp, myself. ADE page numbers are consistent, let me switch among my devices, and also allow me to calculate an accurate percentage for amount read, one that eliminates the extraneous.
Let me fix that for you: "Lets you switch between the same file on your ADE-based readers."*

ADE page numbers are only good across ADE-based devices, just as Kindle page numbers are only correct based on Kindle-based ereaders. Problems occur when you want to leave that specific ecosystem.

If you try to Kindle -> ADE, ADE -> Kindle, ADE -> other EPUB readers (there are a ton out there, particularly on Android), EPUB/Kindle -> other digital formats... those "page numbers" become worthless (just as "screen" numbering systems are worthless outside of those devices/settings).

I put ADE-numbering + Kindle-numbering in the same category (Byte Methods), and they have the same, although slightly different, pitfalls:

Categorization

I sort of came up with these categories while brainstorming tonight:
  • Page Numbers
    • This is what is/has been used in physical books.
    • Pro: It works well when there is only a single version of a book.
    • Pro: It allows easy reference to ~300-800 words in a specific version of a specific book.
    • Pro: You can easily/concisely cross-reference within your own book (in the case of an Index or text)
      • Example: "See the discussion on Page 38."
    • Con: It only works on that very specific version of the book (Publisher, Year Printed, Edition #, Large Print, [...]).
  • "Screen" Numbers
    • This is the method that was used in many earlier ereaders.
    • This method would show you how many "screens" are left until the end of the book, and would change based on the font + font size + margins chosen on the device.
    • Con: It fails dramatically when any of the variables (font + font size + margins + screen size) change, or if a different size device is used.
      • This alone makes it impossible to use for referencing.
    • John F explained his journey with "screen numbers": https://www.mobileread.com/forums/sho...&postcount=151
  • Percentage
    • While very helpful for gauging how far you are in an individual book, this is absolutely too broad + no good in referencing.
    • Note: As JSWolf mentioned, 1% of a 20,000 word book = 200 words. 1% of a 200,000 word book = 2,000 words... Percentage is no good across books, it is only good for individual books. Combined with the time it took to reach a given percentage, it gives you a rough approximation of how long the entire book would take. As we also know, people read different books at different paces (for example, I read very dense non-fiction works... which take MUCH longer for me to read compared to fiction books).
  • Byte Method
    • Pro: Doesn't matter what variables you use. Numbering is based on the HTML file itself.
    • Note: See my "X characters of WHAT?" list in this post for some of the complications to consider: https://www.mobileread.com/forums/sho...85#post3290585
    • Amazon/Kindle
      • 150 uncompressed bytes per "location"
      • Con: Because it is uncompressed, this method greatly inflates the "locations" count blows WAY out of control if the HTML is hideous (see examples below).
    • ADE
      • 1024 compressed bytes per "page"
      • Pro compared to Kindle: Because it is compressed, this gets rid of a lot of the HTML cruft (hopefully the tags/overhead compresses well). This gives a slightly better approximation of actual displayed characters, and closer to physical page numbers.
        • If the code is clean, you can probably get a rough approximation of ~500 words per "page".
    • Con: I find a huge problem with this method is that it doesn't deal with the actual displayed characters... all of the overhead is included, which can make/break the "page" calculation.
  • Paragraph Count
    • Pro: This method would be neutral across all versions of the text.
      • Paragraph 10 in ebook = paragraph 10 in hard cover = paragraph 10 Large Print = paragraph 10 in horribly coded dreck.
    • Con: As I mentioned in the "X characters of WHAT?", this may get thrown off if text is shifted around (for example, moving the TOC to the back of the book or footnotes getting moved to the end of a chapter), or if the functionality of whole parts of a book are shifted to other files (such as an NCX file instead of an actual TOC).
      • Note: Probably better to come up with some sort of per chapter or per subsection paragraphing system. (Sort of like the Bible: "Chapter: Verse")
  • Line Count
    • Note: I would say this is a physical book thing. This has many of the pitfalls of the "Screen" numbering, or the physical book method with different variables (differences between Hard/Soft/Large Print, [...]).
  • Word Count
    • Note: Depends heavily on what you count as a "word".
      • Do you count numbers as a word? There are quite a few differences between "word counts". For example, Microsoft Word/Sigil/Calibre all disagree and vary in "word count" by tiny margins.
    • My personal thoughts on this method... I believe it is "too specific", and would lead to too many errors if trying to reference it.

Technical Details on the "Byte Method"

According to the MobileRead Wiki:

https://wiki.mobileread.com/wiki/Adob...s#Page_numbers
https://wiki.mobileread.com/wiki/Page...Implementation

The ADE automatic "page number" algorithm says each page 1024 compressed bytes and the Kindle goes by every 150 uncompressed bytes.

The problem I find with the Byte Methods is that the file may change (maybe more efficient/different code in the backend), but the text itself will not change.

As an example, let us look at three different versions of the same paragraph:

Example #1: Really bloated code for a single paragraph:

Quote:
<p class="ParaOverride-1"><span class="CharOverride-7" style="position:absolute;top:34px;left:0px;letter-spacing:-0.02px;">Lessequibus</span> <span class="CharOverride-7" style="position:absolute;top:34px;left:80.86px;let ter-spacing:-0.04px;">quisciatem.</span> <span class="CharOverride-7" style="position:absolute;top:52px;left:0px;letter-spacing:-0.19px;">Nam</span> <span class="CharOverride-7" style="position:absolute;top:52px;left:35.09px;let ter-spacing:-0.1px;">atem</span> <span class="CharOverride-7" style="position:absolute;top:52px;left:70.13px;let ter-spacing:-0.02px;">dest,</span> <span class="CharOverride-7" style="position:absolute;top:52px;left:103.33px;le tter-spacing:-0.01px;">consedignis</span> <span class="CharOverride-7" style="position:absolute;top:70px;left:0px;letter-spacing:-0.06px;">ab</span> <span class="CharOverride-7" style="position:absolute;top:70px;left:18.66px;let ter-spacing:-0.06px;">ipicte</span> <span class="CharOverride-7" style="position:absolute;top:70px;left:57.33px;let ter-spacing:0.03px;">experis</span> <span class="CharOverride-7" style="position:absolute;top:70px;left:106.78px;le tter-spacing:-0.08px;">dolupta</span> <span class="CharOverride-7" style="position:absolute;top:88px;left:0px;letter-spacing:-0.03px;">sperchi</span> <span class="CharOverride-7" style="position:absolute;top:88px;left:49.98px;let ter-spacing:-0.08px;">liquunt,</span> <span class="CharOverride-7" style="position:absolute;top:88px;left:103.74px;le tter-spacing:-0.05px;">consequibus</span> <span class="CharOverride-7" style="position:absolute;top:106px;left:0px;letter-spacing:-0.06px;">eatem</span> <span class="CharOverride-7" style="position:absolute;top:106px;left:41.94px;le tter-spacing:-0.03px;">quidemo</span> <span class="CharOverride-7" style="position:absolute;top:106px;left:102.81px;l etter-spacing:-0.01px;">corpor</span> <span class="CharOverride-7" style="position:absolute;top:124px;left:0px;letter-spacing:0.03px;">senihic</span> <span class="CharOverride-7" style="position:absolute;top:124px;left:49.15px;le tter-spacing:-0.05px;">tatemporum</span> <span class="CharOverride-7" style="position:absolute;top:124px;left:133.05px;l etter-spacing:-0.04px;">eliqui</span> <span class="CharOverride-7" style="position:absolute;top:142px;left:0px;letter-spacing:0px;">cusam</span> <span class="CharOverride-7" style="position:absolute;top:142px;left:44.91px;le tter-spacing:0.02px;">fuga.</span> <span class="CharOverride-7" style="position:absolute;top:142px;left:80.03px;le tter-spacing:0.02px;">Dia</span> <span class="CharOverride-7" style="position:absolute;top:142px;left:106.78px;l etter-spacing:-0.12px;">dunto</span> <span class="CharOverride-7" style="position:absolute;top:142px;left:148.56px;l etter-spacing:-0.02px;">cori</span> <span class="CharOverride-7" style="position:absolute;top:160px;left:0px;letter-spacing:0.07px;">od</span> <span class="CharOverride-7" style="position:absolute;top:160px;left:20.38px;le tter-spacing:-0.05px;">quaernat</span> <span class="CharOverride-7" style="position:absolute;top:160px;left:80.73px;le tter-spacing:-0.07px;">iusam</span></p>
Example #2: Compared to this Calibre conversion (it takes those redundant "styles" and shifts them to the CSS file instead). This HTML + CSS (CSS not shown) makes it functionally equivalent to the above:

Quote:
<p class="paraoverride"><span class="charoverride">Lessequibus</span> <span class="charoverride1">quisciatem.</span> <span class="charoverride2">Nam</span> <span class="charoverride3">atem</span> <span class="charoverride4">dest,</span> <span class="charoverride5">consedignis</span> <span class="charoverride6">ab</span> <span class="charoverride7">ipicte</span> <span class="charoverride8">experis</span> <span class="charoverride9">dolupta</span> <span class="charoverride10">sperchi</span> <span class="charoverride11">liquunt,</span> <span class="charoverride12">consequibus</span> <span class="charoverride13">eatem</span> <span class="charoverride14">quidemo</span> <span class="charoverride15">corpor</span> <span class="charoverride16">senihic</span> <span class="charoverride17">tatemporum</span> <span class="charoverride18">eliqui</span> <span class="charoverride19">cusam</span> <span class="charoverride20">fuga.</span> <span class="charoverride21">Dia</span> <span class="charoverride22">dunto</span> <span class="charoverride23">cori</span> <span class="charoverride24">od</span> <span class="charoverride25">quaernat</span> <span class="charoverride26">iusam</span></p>
Example #3: And this would be a cleaner version:

Quote:
<p class="ParaOverride-1">Lessequibus quisciatem. Nam atem dest, consedignis ab ipicte experis dolupta sperchi liquunt, consequibus eatem quidemo corpor senihic tatemporum eliqui cusam fuga. Dia dunto cori od quaernat iusam</p>
The displayed text would be exactly the same for all three:

Quote:
Lessequibus quisciatem. Nam atem dest, consedignis ab ipicte experis dolupta sperchi liquunt, consequibus eatem quidemo corpor senihic tatemporum eliqui cusam fuga. Dia dunto cori od quaernat iusam
Displayed Characters: 197
Words: 27

Example #1: 3122 bytes
Example #1 (Kindle): 20.8 "locations"
Example #1 (1024 uncompressed): 3.05 "pages"
Example #1 (ADE)*: (rounded up to 1)

Example #2: 1186 bytes
Example #2 (Kindle): 7.9 "locations"
Example #2 (1024 uncompressed): 1.15 "pages"
Example #2 (ADE)*: (rounded up to 1)

Example #3: 227 bytes
Example #3 (Kindle): 1.51 "locations" (rounded up to 2)
Example #3 (1024 uncompressed): .22 "pages"
Example #3 (ADE)*: (rounded up to 1)

* Note: I am not too sure how to calculate the exact compressed byte count calculated by ADE, but they are all less than 1... there is a lot of repeat information in the HTML, and text compresses very well.

Pitfall of Byte Counts

Let us say in the future, Amazon decides to run their ebooks through a more efficient conversion program (maybe they figure out a way to generate code with less cruft, or maybe they come up with something like KFX did with "better typography", and they run all the old backlog files through the new conversion).

Example #2 might occur on a book with a Calibre conversion (the HTML itself might change, but the code will still be functionally equivalent).

Or in Example #3 (if I do a major overhaul of the hideous code backend).

The "locations" and ADE "page numbers" change, even though the text itself is exactly equivalent (and this is what is important when referencing or reading). People care about the actual DISPLAYED text, not any of that HTML backend.

Note: I don't know off the top of my head how HTML comments are handled (or metadata tags such as <title>, <head>, [...]) in the ADE/Kindle methods. Does all of that metadata count? Or is it just what is between <body> tags? I know that MOBI strips out all HTML comments.

Last edited by Tex2002ans; 04-05-2016 at 08:14 AM.
Tex2002ans is offline   Reply With Quote