Please refer to my original posting of deimp.exe here
for some background info.
One of the component records in the .RES directory when the .IMP file is exploded with unimp.exe is the DATA.FRK
file. It contains the basic text used in the ebook and is the same for both the Color VGA (REB 1200) & Grayscale Half-VGA (EBW 1150) .IMP files. This DATA.FRK
file is decompressed by deimp.exe
if it was originally (LZSS) compressed, when created, along with control characters (see below) being substituted/expanded.
Element text is extracted and placed in this file. Elements tags are replaced with control characters. This file can be compressed and encrypted with compression occuring before encryption. This file is compressed when the element <meta name="x-SBP-compress" content="on"/> is included in the <x-metadata> element of the package file. The compression algorithm used is LZSS. This file is encrypted when the element <meta name="x-SBP-encrypt" content="on"/> is included in the <x-metadata> element of the package file. The encryption algorithm used is DES. The 8 byte encryption key is in the SoftBook Edition Encryption Key File (.key) at offset 0x0C.
Characters less than 0x20 are removed expect for line break which is replaced with 0x20. Mutliple 0x20 characters are replaced with a single 0x20.
0x0A end of document, forced page break
0x0B start of element except < span >
0x0D line break element < br / >
0x0E start of table element < table >
0x0F image element < img / >
0x13 end of table cell < /td > tag
0x14 horizontal rule element < hr / >
0x15 before and after page header content
0x16 before and after page footer content
As previously stated, my deimp.exe program used as it's base the lzss-0.6 code by Michael Dipperstein (http://michael.dipperstein.com/lzss
), with tweaks by me to get it to decode the .imp text. I added the ability to insert/substitute some characters that are not part of the lzss decompression so that the resulting .imp text looked better. Just remove those and then after decompression, you can substitute them back.
In addition to those control characters
above, characters to "substitute/convert" would be:
HEX => Should be (actual char)
0x8E => "é" (i.e. "é"),
0xA0 => " ", (i.e. " "),
0xA5 => "•", (i.e. "•"),
0xA8 => "®", (i.e. "®"),
0xA9 => "©", (i.e. "©"),
0xAA => "™", (i.e. "™"),
0xAE => "Æ", (i.e. "Æ"),
0xC7 => "«", (i.e. "«"),
0xC8 => "»", (i.e. "»"),
0xC9 => "…", (i.e. "…"),
0xD0 => "–", (i.e. "–"),
0xD1 => "—", (i.e. "—"),
0xD2 => "“", (i.e. "“"),
0xD3 => "”", (i.e. "”"),
0xD4 => "‘", (i.e. "‘"),
0xD5 => "’", (i.e. "’"),
0xE1 => "·", (i.e. "·"),
I attach the source code to my deimp.exe (and original lzss-0.6) below for your use and further study. Please excuse the coding hacks as this was a work-in-progress until I "nailed" the decompression algorithm. It didn't lend itself to good programming style.
p.s. as an exercise, would anyone want to try tweaking this code to allow the LZSS (re-)compression of text for use as the DATA.FRK in the .imp?