View Single Post
Old 04-16-2008, 11:37 AM   #1
nrapallo
GuteBook/Mobi2IMP Creator
nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.
 
nrapallo's Avatar
 
Posts: 2,958
Karma: 2530691
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200 EBW1150 Device: T1 NSTG iLiad_v2 NC Device: Asus_TF Next1 WPDN
Thumbs up Converting .IMP to anything? WE ARE NOW THERE!

EDIT: 16 May 2008

Welcome to deimp.exe, the text decompressor (extractor) for .imp files created by Nick Rapallo (me).
See the thread Reverse-engineering the .IMP format for deimp's C source code as deimp_v0.1_source.zip.

Version 0.1 is very basic, but works well given these caveats:
  • Works on only un-encrypted .imp files i.e non DRM'ed files.
  • Assumes that the .imp file was compressed when created. For now, if not, you will get gibberish. To avoid this situation the included .bat file requires you to uncomment a line for this case.
  • Windows only executable assumes Intel LE (Little Endian) byte ordering (LSB to MSB).
  • Only extracts (basic) text files; not underlying html. As such no tables, images, formatting is retained. DOS line endings and tabs are inserted.
  • In the resulting '.txt', table starts are indicated by '|', table cells are indicated by '_', image locations are indicated by '|^|' and <hr> is shown as '____________'.

Usage:

1. Unzip deimp_v0.1.zip into the directory where your .imp files are stored. All sub-directories below are processed recursively, leaving only the '.txt' extracted.

2. Double-click the windows batch file 'extract text from imp files.bat' and wait.

3. That's it. Just edit resulting '.txt' files. Please note that you will have to replace non-common characters like certain quotes, mdashes, etc...

The extracted text (with no images/hyperlinks) can then be easily converted by BookDesigner or equivalent.

Thanks (and a bit of karma) goes to delphidb96 for the link to the LZSS source code (it is the basis of deimp)! And thanks go to Michael Dipperstein (mdipper@alumni.engr.ucsb.edu) for his LZSS source code (lzss-0.6.zip). More information on LZSS encoding may be found at: http://michael.dipperstein.com/lzss

Enjoy!
-Nick

p.s. you do not need the 'unimp.zip' and 'reimp.zip' files; just get the 'deimp_v0.1.zip'!

p.p.s. added 'The Pilgrims Progress in Words of One Syllable.RES.txt', sample conversion of .imp here

p.p.p.s. should you need to extract only one .imp file, you can use the 'extract.bat.txt' attachment (just save as 'extract.bat') and in the MS-DOS command prompt window, type:
1. unimp "Impfilename.imp" and
2. extract "Resdirname" (note no '.RES' and 'Resdirname' may differ from 'Impfilename').


Previously....
This was a response posted in another forum about converting .imp to .prc (or anything useful!)
Quote:
Originally Posted by nrapallo View Post
No, Mobi2IMP doesn't convert from imp to mobi.

Unfortunately, the .IMP format is an 'end' one; meaning there is no known way to extract the original source (.html and images).

However, there are several ways that the ebook 'text' can be extracted to varying degrees of success, as follows:

1. Using the 'ebook viewer.exe' installed on the PC (after installing the free eBook Publisher software here), you would 'Print' using a printer driver that saves to .pdf format, then OCR the resulting .pdf.

2. With some .IMP, it is possible to open them with a text editor (like Wordpad) and 'go to the middle' of the document. There you may find the text used in the .IMP ebook. This depends on which software and compression-setting created that .IMP (does seems to work with recent eBook Publisher creations that are not internally compressed - with LZSS)

Either way, you would loose all formatting, links and all HTML codes in the process. But, would have the text to 'begin' the conversion process.

Most users creating .IMP's ALSO retain the original source for this very reason.

Hope this helps.

Edit: this has all been previously discussed before here
From the IMP Technical Specs:
Quote:
DATA.FRK File
Element text is extracted and placed in this file. Elements tags are replaced with control characters. This file can be compressed and encrypted with compression occuring before encryption. This file is compressed when the element <meta name="x-SBP-compress" content="on"/> is included in the <x-metadata> element of the package file. The compression algorithm used is LZSS. This file is encrypted when the element <meta name="x-SBP-encrypt" content="on"/> is included in the <x-metadata> element of the package file. The encryption algorithm used is DES. The 8 byte encryption key is in the SoftBook Edition Encryption Key File (.key) at offset 0x0C.

Characters less than 0x20 are removed expect for line break which is replaced with 0x20. Mutliple 0x20 characters are replaced with a single 0x20.

Control characters
0x0A end of document, forced page break
0x0B start of element except <span>
0x0D line break element <br />
0x0E start of table element <table>
0x0F image element <img />
0x13 end of table cell </td> tag
0x14 horizontal rule element <hr />
0x15 before and after page header content
0x16 before and after page footer content
By looking at the .IMP specs and exploding the .imp to .res format with unimp.exe, I have noticed that the "text" portion (Data.frk) of the 1150 and 1200 .imp versions are identical (even for very complex files)! The formatting changes are stored in the various files within the .res for each reader and those differ greatly!

Alas, if the "text" is compressed, then it is not visible within the .imp. It must first be 'uncompressed' using a LZSS algorithm so we are not there yet!

Still trying... WE ARE NOW THERE!

p.s. I've added the unimp.exe program as well as a reimp (sbtest.exe) program. Just drag and drop your file onto the .exe name in Windows Explorer (or a shortcut icon you've created on your desktop).
Attached Files
File Type: zip unimp.zip (2.5 KB, 4928 views)
File Type: zip reimp.zip (194.8 KB, 3917 views)
File Type: zip deimp_v0.1.zip (726.1 KB, 6633 views)
File Type: txt The Pilgrims Progress in Words of One Syllable.RES.txt (128.7 KB, 2302 views)
File Type: txt extract.bat.txt (391 Bytes, 2726 views)
File Type: txt deimp-readme.txt (1.7 KB, 2846 views)

Last edited by nrapallo; 01-13-2009 at 11:02 PM. Reason: added sample conversion of .imp to .txt
nrapallo is offline   Reply With Quote