View Single Post
Old 02-07-2014, 10:06 AM   #4
rpspringuel
Enthusiast
rpspringuel began at the beginning.
 
Posts: 40
Karma: 10
Join Date: Feb 2014
Device: Kindle 4
Correct me if I'm wrong, but isn't an attribute a property of a tag? I.e. I can't just put 'data-page="1"' in the text of the file (it would be treated as text if I did) but must put something like '<wbr data-page="1">'. Now, I'll grant you that if every page break occurred at the start of a new element (heading, paragraph, etc.), one could simply add that attribute to the appropriate opening element tag, but page breaks often occur in the middle of a paragraph element where there is no existing tag to attach the element to. I would thus need to introduce a tag in those locations. Further, I would argue that for consistency sake it would be better if all page break locations, not just those in the middle of a paragraph were marked by the same element. This makes them easier to find in a human-readable fashion.

In researching the data- attribute (which I hadn't heard of before) I discovered the wordbreak (wbr) tag, which I think is a good candidate for marking page locations (hence my use of it above). It's a void element, and thus doesn't require a companion closing tag (unlike an anchor (a) tag). It is a new tag to HTML5 and is intended for marking line break opportunities in really long words. For both reasons, it should be unlikely to appear in most books. My quick testing shows that it is a tag which is preserved in azw3 and it doesn't affect the viewing of the document.

Of course, that's if the reverse engineering process doesn't pan out. A quick search on amazon found that they do have at least some books for free with real page numbers. Not anything I would normally want to read, but then that isn't the purpose here. I haven't had the chance to "buy" them yet to discover what their file format is (amazon doesn't list the file format in the item description), but hopefully there's enough to find some in azw3 format. I'll start looking for that this weekend, hopefully.
rpspringuel is offline   Reply With Quote