View Single Post
Old 02-22-2012, 04:28 PM   #2
knc1
Going Viral
knc1 ought to be getting tired of karma fortunes by now.knc1 ought to be getting tired of karma fortunes by now.knc1 ought to be getting tired of karma fortunes by now.knc1 ought to be getting tired of karma fortunes by now.knc1 ought to be getting tired of karma fortunes by now.knc1 ought to be getting tired of karma fortunes by now.knc1 ought to be getting tired of karma fortunes by now.knc1 ought to be getting tired of karma fortunes by now.knc1 ought to be getting tired of karma fortunes by now.knc1 ought to be getting tired of karma fortunes by now.knc1 ought to be getting tired of karma fortunes by now.
 
knc1's Avatar
 
Posts: 17,212
Karma: 18210809
Join Date: Feb 2012
Location: Central Texas
Device: No K1, PW2, KV, KOA
Catalog Record layout

I find it hard to believe that I haven't written this up somewhere on the web already, oh well, here goes.

The raw records are one, single line, record per file.

The first field is the sha1 sum of the file;
The second field is the volume ID (in this case: "Kindle");
Which is followed by one or more fields delimited by "|";
The last field is the path and filename which was checksumed in the first field.

The (variable number) of fields between the volume ID and the last field are "containers" (usually archives).

Here a (short) example, with the single line broken for posting at "|" :

The "poor man's query tool" for this database, grep:
Code:
knoppix:cat$ grep 'udev/rules.d/60' Amazon_2012.02.18_sha1.cat | sort
Which results in a sorted list of the matching udev rules in all Amazon source code releases.

Code:
7748689817fdd946e73e31b703a40f421249646a  Kindle|
sha1sum and volume id

Code:
/kdx/Kindle_src_2.1.1_351050064.tar.gz|/gplrelease/udev-112.tar.bz2|
volume path and file name (a compressed archive)
which in turn contains another compressed archive
which in turn contains this file (the one that was sha1sum'd):

Code:
/udev-112/etc/udev/rules.d/60-persistent-input.rules
Which serves the needs of a developer who is wondering: "which models/releases use this identical framebuffer driver".

All that person needs to do is invent a grep expression and ask. ;-)

Note:
My dream was to import these catalogs into an MySQL database, but I have
been running this script for nearly ten years now and not 'gotten around to it' yet.

The record format is such that it can be imported into OpenOffice and searched there as a spreadsheet based database.
(and from there, OpenOffice could populate a for-real database).

As I write, the script is approaching the 1 1/2 million record mark, still running.
knc1 is offline   Reply With Quote