View Single Post
Old 12-04-2008, 11:42 AM   #1
llasram
Reticulator of Tharn
llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.
 
llasram's Avatar
 
Posts: 618
Karma: 400000
Join Date: Jan 2007
Location: EST
Device: Sony PRS-505
LIT generation -- binary analysis help with the last %0.1?

After following a comment thread about e-book formats over at tor.com last week, I spontaneously decided to apply what I'd learned from writing lit2oeb for calibre to writing an "oeb2lit" open-source, free-as-in-freedom LIT-generation tool.

I need to neaten up the code a bit, but I'm basically 99.9% percent there, creating LIT files with all of LIT's baroque cross-indices in place and which MSReader treats indistinguishably from commercially-generated LIT files. The ITOL/ITLS archive format is fairly well-documented (Microsoft's ITOL/ITLS format plus the ConvertLIT source code), but I had to figure out the exact format of most of the LIT-specific files from scratch.

The 0.1% I haven't figured out yet is a hash function. For each HTML file in the LIT book, the LIT archive contains three entries: a "content" entry containing the actual markup, an "ahc" entry containing the hash-collision table for a hash of all the referenced fragment identifiers in the markup, and an "aht" entry containing a fixed-record-width index into the collision-table. I'm able to fake the "ahc" and "aht" entries by just always creating a collision table of length 1, but I'm not sure the WinCE version of MSReader will handle that cleanly, so it'd be nice to create these structures correctly.

I'm terrible at binary analysis, but would any of those here who are so-skilled be interested in taking a peek at either MSReader or the MS-provided LIT-generation SDK to figure out the hash function used?

Thanks! :-)

-Marshall
llasram is offline   Reply With Quote