Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Readers > Amazon Kindle > Kindle Developer's Corner

Notices

Reply
 
Thread Tools Search this Thread
Old 05-30-2016, 03:00 PM   #1
geekmaster
Bare metal reality.
geekmaster ought to be getting tired of karma fortunes by now.geekmaster ought to be getting tired of karma fortunes by now.geekmaster ought to be getting tired of karma fortunes by now.geekmaster ought to be getting tired of karma fortunes by now.geekmaster ought to be getting tired of karma fortunes by now.geekmaster ought to be getting tired of karma fortunes by now.geekmaster ought to be getting tired of karma fortunes by now.geekmaster ought to be getting tired of karma fortunes by now.geekmaster ought to be getting tired of karma fortunes by now.geekmaster ought to be getting tired of karma fortunes by now.geekmaster ought to be getting tired of karma fortunes by now.
 
geekmaster's Avatar
 
Posts: 6,339
Karma: 10764488
Join Date: Nov 2011
Location: Multiverse 6627A
Device: K1 to PW3
Kindle source code observations

On March 29, I downloaded all the (eink) Kindle_src*.* GPL tarballs from the amazon "Source Code Notice" web page, after doing a "sort|uniq" on them (because the same URLs are used for multiple kindle models).

My script took more than half a day to complete, even with my 40Mbps internet connection. The result was a 20GB directory full of .tar.gz (and some .tar) files. Early kindle tarballs were not compressed, which is unimportant because all the tarballs inside them ARE compressed (mostly .tar.bz2, or .tar.gz in older firmwares). Interestingly, most source code tarballs also contain a .ipk file, which expands into a (mostly) empty root filesystem (with a few small "default" files in /etc).

Last night (May 29), I downloaded all the eink kindle GPL tarballs again, but this time I made a script to extract the URL list from the web page, and download all the files in that list (keeping a log file this time). Before the "sort|uniq" there were 131 URLs, but 89 files to download with unique URLs.

Even though I avoided downloading files with duplicate URLs, doing "md5sum" on my downloads shows a bunch of duplicate files (16 or the with different "version number" names.

Obviously, there were addition firmware versions (more files) to download in the recent batch, as was to be expected.

Even more interesting is when I compare my recent GPL source code set with that from two months ago. First off, I see that some filenames changed -- previously they contained FW version (the numbers with all the dots in them) and OTA version (that long string of trailing digits), but now some of them have had their OTA version string stripped off (giving a different download URL for the same FW version).

Another thing interesting is that SOME files that have the SAME URL as they did two months ago, now have a DIFFERENT md5sum than the same URL from the previous download set (but the SAME URL as a slightly newer firmware version with a different URL, therefore a duplicate download).

So, it seems that amazon likes to "rewrite history" for their source code, just like they do for old firmware update downloads. Same URL, but different content from previous archived downloads.

Now, when I expand all those meta-tarballs (tarballs full of tarballs), I expect to have a HUGE amount of duplication, because the inner tarballs also contain version numbers in their filenames, and the filenames have a huge amount of duplication between firmware versions. HOWEVER, I would NOT be surprised to see different contents for some of these identically-named tarballs.

One additional point of interest -- these tarball files typically contain a folder. The outer tarballs call this inner folder either "gplresults" or "gplrelease", with no mention of the firmware version on them. So my "unpacker" script renames the "gpl*" folder to the "Kindle_src*" base filename (with .tar* stripped off the end). In general, the inner tarballs expand into a folder matching their base filename, except for the "linux*" kernal folder which has "-lab126" stripped off the inner folder name (no longer matching the rootfs folder where kernel modules are stored).

I will post my scripts and such later after I clean them up a bit, in this first post. Though to save bandwidth for others wishing to study ALL the source code, I plan to replace duplicates in the entire set with symlinks, then stuff it all into a bit .xz tarball (or perhaps a mult-part download). Because it is GPL, I can reupload my de-duplicated code set to a site that can support big files (preferrably with download-resume capability).

I wonder how big my 20GB of (inner) .tar.gz and .tar.bz2 files will get after I unpack them?

Last edited by geekmaster; 05-30-2016 at 03:03 PM.
geekmaster is offline   Reply With Quote
Old 05-30-2016, 03:26 PM   #2
knc1
¯\_(ツ)_/¯ <- Clueless
knc1 ought to be getting tired of karma fortunes by now.knc1 ought to be getting tired of karma fortunes by now.knc1 ought to be getting tired of karma fortunes by now.knc1 ought to be getting tired of karma fortunes by now.knc1 ought to be getting tired of karma fortunes by now.knc1 ought to be getting tired of karma fortunes by now.knc1 ought to be getting tired of karma fortunes by now.knc1 ought to be getting tired of karma fortunes by now.knc1 ought to be getting tired of karma fortunes by now.knc1 ought to be getting tired of karma fortunes by now.knc1 ought to be getting tired of karma fortunes by now.
 
knc1's Avatar
 
Posts: 11,178
Karma: 11512569
Join Date: Feb 2012
Location: Central Texas
Device: Nothing KV or newer, not interested.
Ah, I have a script that will "catalog" the contents of all archives found in a directory sub-tree, down to the file level by sha1.

ah -
but you need linux and bash prior to bash-4.0
knc1 is offline   Reply With Quote
Old 05-30-2016, 04:18 PM   #3
geekmaster
Bare metal reality.
geekmaster ought to be getting tired of karma fortunes by now.geekmaster ought to be getting tired of karma fortunes by now.geekmaster ought to be getting tired of karma fortunes by now.geekmaster ought to be getting tired of karma fortunes by now.geekmaster ought to be getting tired of karma fortunes by now.geekmaster ought to be getting tired of karma fortunes by now.geekmaster ought to be getting tired of karma fortunes by now.geekmaster ought to be getting tired of karma fortunes by now.geekmaster ought to be getting tired of karma fortunes by now.geekmaster ought to be getting tired of karma fortunes by now.geekmaster ought to be getting tired of karma fortunes by now.
 
geekmaster's Avatar
 
Posts: 6,339
Karma: 10764488
Join Date: Nov 2011
Location: Multiverse 6627A
Device: K1 to PW3
Well, I just did a quick test, and my suspicions were correct. The amazon GPL source code tarballs contain hundreds (out of thousands) of inner tarballs with IDENTITICAL "<package>.<version>.tar.bz2" names but DIFFERENT md5sums. That means they cannot be de-duped without unpacking them first...

EDIT: So, for some reason only known to amazon/lab126, somebody decided it was a good idea to distribute huge files with different URLs (and different filenames/version numbers) that have identical contents (and identical md5sum). And likewise, to change the contents (and md5sum) over time, of identical URLs and filenames. And even more befuddling, is why the compressed tarballs INSIDE these GPL source code packages often have identical names (and version numbers) but DIFFERENT contents (i.e. different md5sum). It seems like they are trolling us, just for fun (or perhaps to befuddle us). Instead of TRYING to maintain "firmware update" compatibility, we *could* just REPLACE all the firmware with our own custom "MORE BETTERIZED" firmware, like the Nook folks are fond of doing...

Last edited by geekmaster; 05-30-2016 at 06:35 PM.
geekmaster is offline   Reply With Quote
Old 05-30-2016, 06:39 PM   #4
geekmaster
Bare metal reality.
geekmaster ought to be getting tired of karma fortunes by now.geekmaster ought to be getting tired of karma fortunes by now.geekmaster ought to be getting tired of karma fortunes by now.geekmaster ought to be getting tired of karma fortunes by now.geekmaster ought to be getting tired of karma fortunes by now.geekmaster ought to be getting tired of karma fortunes by now.geekmaster ought to be getting tired of karma fortunes by now.geekmaster ought to be getting tired of karma fortunes by now.geekmaster ought to be getting tired of karma fortunes by now.geekmaster ought to be getting tired of karma fortunes by now.geekmaster ought to be getting tired of karma fortunes by now.
 
geekmaster's Avatar
 
Posts: 6,339
Karma: 10764488
Join Date: Nov 2011
Location: Multiverse 6627A
Device: K1 to PW3
Quote:
Originally Posted by knc1 View Post
Ah, I have a script that will "catalog" the contents of all archives found in a directory sub-tree, down to the file level by sha1.

ah -
but you need linux and bash prior to bash-4.0
But does your script need to decompress the files to SHA-1 them? If so, I hope on ramdisk, so as not to wear out my SSDs (all I have in this machine).

Really, "wear out" is not all that accurate -- the problem is with mostly-full SSDs, TRIM support has little free space to work in, so I need to periodically backup the drive, secure erase it, and restore it, to get my "factory fresh" write speed back (a bit cumbersome for an mSATA boot drive). SSDs get damn slow (like 10-percent speed) after written to a lot -- secure erase fixes that.

I still have (expensive) SSDs from back in the days before TRIM even existed. But still, even TRIM drives are rather ineffective unless you keep them mostly empty (yeah, like that ever happens, eh?)...
geekmaster is offline   Reply With Quote
Old 05-31-2016, 04:08 PM   #5
NullNix
Addict
NullNix ought to be getting tired of karma fortunes by now.NullNix ought to be getting tired of karma fortunes by now.NullNix ought to be getting tired of karma fortunes by now.NullNix ought to be getting tired of karma fortunes by now.NullNix ought to be getting tired of karma fortunes by now.NullNix ought to be getting tired of karma fortunes by now.NullNix ought to be getting tired of karma fortunes by now.NullNix ought to be getting tired of karma fortunes by now.NullNix ought to be getting tired of karma fortunes by now.NullNix ought to be getting tired of karma fortunes by now.NullNix ought to be getting tired of karma fortunes by now.
 
Posts: 322
Karma: 1188908
Join Date: Jan 2013
Location: Ely, Cambridgeshire, UK
Device: Kindle Oasis, Kindle Paperwhite
Quote:
Originally Posted by geekmaster View Post
EDIT: So, for some reason only known to amazon/lab126, somebody decided it was a good idea to distribute huge files with different URLs (and different filenames/version numbers) that have identical contents (and identical md5sum). And likewise, to change the contents (and md5sum) over time, of identical URLs and filenames. And even more befuddling, is why the compressed tarballs INSIDE these GPL source code packages often have identical names (and version numbers) but DIFFERENT contents (i.e. different md5sum). It seems like they are trolling us, just for fun (or perhaps to befuddle us).
Nah, this happens by default if you're not careful: both tar and gzip save dates and times, so tar cfz'ing the same tree twice in succession will almost always produce files with different md5sums.
NullNix is offline   Reply With Quote
Old 05-31-2016, 05:14 PM   #6
knc1
¯\_(ツ)_/¯ <- Clueless
knc1 ought to be getting tired of karma fortunes by now.knc1 ought to be getting tired of karma fortunes by now.knc1 ought to be getting tired of karma fortunes by now.knc1 ought to be getting tired of karma fortunes by now.knc1 ought to be getting tired of karma fortunes by now.knc1 ought to be getting tired of karma fortunes by now.knc1 ought to be getting tired of karma fortunes by now.knc1 ought to be getting tired of karma fortunes by now.knc1 ought to be getting tired of karma fortunes by now.knc1 ought to be getting tired of karma fortunes by now.knc1 ought to be getting tired of karma fortunes by now.
 
knc1's Avatar
 
Posts: 11,178
Karma: 11512569
Join Date: Feb 2012
Location: Central Texas
Device: Nothing KV or newer, not interested.
Quote:
Originally Posted by geekmaster View Post
But does your script need to decompress the files to SHA-1 them? If so, I hope on ramdisk, so as not to wear out my SSDs (all I have in this machine).

Really, "wear out" is not all that accurate -- the problem is with mostly-full SSDs, TRIM support has little free space to work in, so I need to periodically backup the drive, secure erase it, and restore it, to get my "factory fresh" write speed back (a bit cumbersome for an mSATA boot drive). SSDs get damn slow (like 10-percent speed) after written to a lot -- secure erase fixes that.

I still have (expensive) SSDs from back in the days before TRIM even existed. But still, even TRIM drives are rather ineffective unless you keep them mostly empty (yeah, like that ever happens, eh?)...
http://hg.minimodding.com/repos/cats/shacat.hg/

It doesn't (now) do xz compression, it does not recognize Amazon update_*.bin as a 'container' format.

It isn't very user friendly, but I am willing to help with examples as required.

oh -
and 20Gbytes of compressed archives is well within its limits.

Rethink -
There are already posted here some worked examples -
Also, a script that builds (@file level) concordances of the contents of the data file.

Last edited by knc1; 05-31-2016 at 05:23 PM.
knc1 is offline   Reply With Quote
Old 05-31-2016, 05:52 PM   #7
geekmaster
Bare metal reality.
geekmaster ought to be getting tired of karma fortunes by now.geekmaster ought to be getting tired of karma fortunes by now.geekmaster ought to be getting tired of karma fortunes by now.geekmaster ought to be getting tired of karma fortunes by now.geekmaster ought to be getting tired of karma fortunes by now.geekmaster ought to be getting tired of karma fortunes by now.geekmaster ought to be getting tired of karma fortunes by now.geekmaster ought to be getting tired of karma fortunes by now.geekmaster ought to be getting tired of karma fortunes by now.geekmaster ought to be getting tired of karma fortunes by now.geekmaster ought to be getting tired of karma fortunes by now.
 
geekmaster's Avatar
 
Posts: 6,339
Karma: 10764488
Join Date: Nov 2011
Location: Multiverse 6627A
Device: K1 to PW3
Quote:
Originally Posted by knc1 View Post
http://hg.minimodding.com/repos/cats/shacat.hg/

It doesn't (now) do xz compression, it does not recognize Amazon update_*.bin as a 'container' format.

It isn't very user friendly, but I am willing to help with examples as required.

oh -
and 20Gbytes of compressed archives is well within its limits.

Rethink -
There are already posted here some worked examples -
Also, a script that builds (@file level) concordances of the contents of the data file.
I remember your posts, and twobob's comments on the huge effort involved. At the time, I did not see how that was relevant to me and my goals. However, goals change, and it seems interesting now... I will need to examine your concordances. Thanks.
geekmaster is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Kindle source code atleon Kindle Developer's Corner 3 10-04-2014 03:49 PM
Kindle Source Code (Links) TadW Kindle Developer's Corner 78 07-11-2014 01:35 PM
A question about Kindle source code liuto Kindle Developer's Corner 1 05-30-2011 11:05 AM
Kindle Source code online?!? Gwen Morse Amazon Kindle 3 09-08-2010 12:27 PM
Some Kindle source code digging TadW Kindle Developer's Corner 9 10-04-2009 01:34 AM


All times are GMT -4. The time now is 05:59 AM.


MobileRead.com is a privately owned, operated and funded community.