View Single Post
Old 12-18-2008, 09:06 AM   #2
nrapallo
GuteBook/Mobi2IMP Creator
nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.
 
nrapallo's Avatar
 
Posts: 2,958
Karma: 2530691
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200 EBW1150 Device: T1 NSTG iLiad_v2 NC Device: Asus_TF Next1 WPDN
DATA.FRK and LZSS decompression

Please refer to my original posting of deimp.exe here for some background info.

One of the component records in the .RES directory when the .IMP file is exploded with unimp.exe is the DATA.FRK file. It contains the basic text used in the ebook and is the same for both the Color VGA (REB 1200) & Grayscale Half-VGA (EBW 1150) .IMP files. This DATA.FRK file is decompressed by deimp.exe if it was originally (LZSS) compressed, when created, along with control characters (see below) being substituted/expanded.

DATA.FRK File

Element text is extracted and placed in this file. Elements tags are replaced with control characters. This file can be compressed and encrypted with compression occuring before encryption. This file is compressed when the element <meta name="x-SBP-compress" content="on"/> is included in the <x-metadata> element of the package file. The compression algorithm used is LZSS. This file is encrypted when the element <meta name="x-SBP-encrypt" content="on"/> is included in the <x-metadata> element of the package file. The encryption algorithm used is DES. The 8 byte encryption key is in the SoftBook Edition Encryption Key File (.key) at offset 0x0C.

Characters less than 0x20 are removed expect for line break which is replaced with 0x20. Mutliple 0x20 characters are replaced with a single 0x20.

Control characters
Code:
0x0A end of document, forced page break 
0x0B start of element except < span >
0x0D line break element < br / >
0x0E start of table element < table > 
0x0F image element < img / > 
0x13 end of table cell < /td > tag 
0x14 horizontal rule element < hr / > 
0x15 before and after page header content 
0x16 before and after page footer content
As previously stated, my deimp.exe program used as it's base the lzss-0.6 code by Michael Dipperstein (http://michael.dipperstein.com/lzss), with tweaks by me to get it to decode the .imp text. I added the ability to insert/substitute some characters that are not part of the lzss decompression so that the resulting .imp text looked better. Just remove those and then after decompression, you can substitute them back.

In addition to those control characters above, characters to "substitute/convert" would be:
Code:
        HEX => Should be (actual char)
        0x8E => "&eacute;"  (i.e. "é"),
        0xA0 => "&nbsp;",   (i.e. " "),
        0xA5 => "&bull;",     (i.e. "•"),
        0xA8 => "&reg;",     (i.e. "®"),
        0xA9 => "&copy;",   (i.e. "©"),
        0xAA => "&trade;",  (i.e. "™"),
        0xAE => "&AElig;",   (i.e. "Æ"),
        0xC7 => "&laquo;",  (i.e. "«"),
        0xC8 => "&raquo;",  (i.e. "»"),
        0xC9 => "&hellip;",   (i.e. "…"),
        0xD0 => "&ndash;",  (i.e. "–"),
        0xD1 => "&mdash;", (i.e. "—"),
        0xD2 => "&ldquo;",    (i.e. "“"),
        0xD3 => "&rdquo;",   (i.e. "”"),
        0xD4 => "&lsquo;",    (i.e. "‘"),
        0xD5 => "&rsquo;",   (i.e. "’"),
        0xE1 => "&middot;",  (i.e. "·"),
I attach the source code to my deimp.exe (and original lzss-0.6) below for your use and further study. Please excuse the coding hacks as this was a work-in-progress until I "nailed" the decompression algorithm. It didn't lend itself to good programming style.

p.s. as an exercise, would anyone want to try tweaking this code to allow the LZSS (re-)compression of text for use as the DATA.FRK in the .imp?
Attached Files
File Type: zip deimp_v0.1_source.zip (830.4 KB, 1907 views)

Last edited by nrapallo; 12-18-2008 at 10:46 PM. Reason: added actual character to substituted characters
nrapallo is offline   Reply With Quote