View Full Version : Converting .IMP to anything? WE ARE NOW THERE!


nrapallo
04-16-2008, 11:37 AM
EDIT: 16 May 2008

Welcome to deimp.exe, the text decompressor (extractor) for .imp files created by Nick Rapallo (me).
See the thread Reverse-engineering the .IMP format (http://www.mobileread.com/forums/showthread.php?t=34212) for deimp's C source code as deimp_v0.1_source.zip (http://www.mobileread.com/forums/showthread.php?p=309579#post309579).

Version 0.1 is very basic, but works well given these caveats:

Works on only un-encrypted .imp files i.e non DRM'ed files.
Assumes that the .imp file was compressed when created. For now, if not, you will get gibberish. To avoid this situation the included .bat file requires you to uncomment a line for this case.
Windows only executable assumes Intel LE (Little Endian) byte ordering (LSB to MSB).
Only extracts (basic) text files; not underlying html. As such no tables, images, formatting is retained. DOS line endings and tabs are inserted.
In the resulting '.txt', table starts are indicated by '|', table cells are indicated by '_', image locations are indicated by '|^|' and <hr> is shown as '____________'.


Usage:

1. Unzip deimp_v0.1.zip into the directory where your .imp files are stored. All sub-directories below are processed recursively, leaving only the '.txt' extracted.

2. Double-click the windows batch file 'extract text from imp files.bat' and wait.

3. That's it. Just edit resulting '.txt' files. Please note that you will have to replace non-common characters like certain quotes, mdashes, etc...

The extracted text (with no images/hyperlinks) can then be easily converted by BookDesigner or equivalent.

Thanks (and a bit of karma) goes to delphidb96 for the link to the LZSS source code (it is the basis of deimp)! And thanks go to Michael Dipperstein (mdipper@alumni.engr.ucsb.edu) for his LZSS source code (lzss-0.6.zip). More information on LZSS encoding may be found at: http://michael.dipperstein.com/lzss

Enjoy!
-Nick

p.s. you do not need the 'unimp.zip' and 'reimp.zip' files; just get the 'deimp_v0.1.zip'!

p.p.s. added 'The Pilgrims Progress in Words of One Syllable.RES.txt', sample conversion of .imp here (http://www.mobileread.com/forums/showthread.php?t=23645)

p.p.p.s. should you need to extract only one .imp file, you can use the 'extract.bat.txt' attachment (just save as 'extract.bat') and in the MS-DOS command prompt window, type:
1. unimp "Impfilename.imp" and
2. extract "Resdirname" (note no '.RES' and 'Resdirname' may differ from 'Impfilename').

Previously....This was a response posted in another forum about converting .imp to .prc (or anything useful!)
No, Mobi2IMP doesn't convert from imp to mobi.

Unfortunately, the .IMP format is an 'end' one; meaning there is no known way to extract the original source (.html and images).

However, there are several ways that the ebook 'text' can be extracted to varying degrees of success, as follows:

1. Using the 'ebook viewer.exe' installed on the PC (after installing the free eBook Publisher software here (http://www.ebooktechnologies.com/support_publisher_download.htm)), you would 'Print' using a printer driver that saves to .pdf format, then OCR the resulting .pdf.

2. With some .IMP, it is possible to open them with a text editor (like Wordpad) and 'go to the middle' of the document. There you may find the text used in the .IMP ebook. This depends on which software and compression-setting created that .IMP (does seems to work with recent eBook Publisher creations that are not internally compressed - with LZSS)

Either way, you would loose all formatting, links and all HTML codes in the process. But, would have the text to 'begin' the conversion process.

Most users creating .IMP's ALSO retain the original source for this very reason.

Hope this helps.

Edit: this has all been previously discussed before here (http://www.mobileread.com/forums/showthread.php?p=65231#post65231)

From the IMP Technical Specs:
DATA.FRK File
Element text is extracted and placed in this file. Elements tags are replaced with control characters. This file can be compressed and encrypted with compression occuring before encryption. This file is compressed when the element <meta name="x-SBP-compress" content="on"/> is included in the <x-metadata> element of the package file. The compression algorithm used is LZSS. This file is encrypted when the element <meta name="x-SBP-encrypt" content="on"/> is included in the <x-metadata> element of the package file. The encryption algorithm used is DES. The 8 byte encryption key is in the SoftBook Edition Encryption Key File (.key) at offset 0x0C.

Characters less than 0x20 are removed expect for line break which is replaced with 0x20. Mutliple 0x20 characters are replaced with a single 0x20.

Control characters
0x0A end of document, forced page break
0x0B start of element except <span>
0x0D line break element <br />
0x0E start of table element <table>
0x0F image element <img />
0x13 end of table cell </td> tag
0x14 horizontal rule element <hr />
0x15 before and after page header content
0x16 before and after page footer content

By looking at the .IMP specs and exploding the .imp to .res format with unimp.exe, I have noticed that the "text" portion (Data.frk) of the 1150 and 1200 .imp versions are identical (even for very complex files)! The formatting changes are stored in the various files within the .res for each reader and those differ greatly!

Alas, if the "text" is compressed, then it is not visible within the .imp. It must first be 'uncompressed' using a LZSS algorithm so we are not there yet!

Still trying... WE ARE NOW THERE!

p.s. I've added the unimp.exe program as well as a reimp (sbtest.exe) program. Just drag and drop your file onto the .exe name in Windows Explorer (or a shortcut icon you've created on your desktop).

LeserattePD
04-17-2008, 07:37 AM
Thanks a lot for taking this problem on!

It is much appreciated, as I am planning to move from my ebookwise to an eink device sometime this year and would hate to go through downloading all my ebooks again. Thankfully my books mostly are from BAEN so I can download them in another format, but I also have a few secure format books from ebookwise and would hate to loose those (especially as they are so expensive to begin with!).

nrapallo
04-17-2008, 08:43 AM
Thanks a lot for taking this problem on!

It is much appreciated, as I am planning to move from my ebookwise to an eink device sometime this year and would hate to go through downloading all my ebooks again. Thankfully my books mostly are from BAEN so I can download them in another format, but I also have a few secure format books from ebookwise and would hate to loose those (especially as they are so expensive to begin with!).

Sorry, we are not there yet, not even close!

And I am not looking to 'crack' secured .imp ebooks, only non-DRMed .imp would be supported, if I can find a LZSS uncompressor routine.

delphidb96
04-17-2008, 11:15 AM
Sorry, we are not there yet, not even close!

And I am not looking to 'crack' secured .imp ebooks, only non-DRMed .imp would be supported, if I can find a LZSS uncompressor routine.

Well, you have no need to worry about 'cracking' any BAEN books because they've always been DRM-free. :) (That's what I love about BAEN books.) As for an LZSS decompressor routine... I've attached just one of the many source files that I googled to this post. I found it here:

http://michael.dipperstein.com/lzss/#download

Enjoy! :D

Derek

nrapallo
04-17-2008, 11:22 AM
Well, you have no need to worry about 'cracking' any BAEN books because they've always been DRM-free. :) (That's what I love about BAEN books.) As for an LZSS decompressor routine... I've attached just one of the many source files that I googled to this post. I found it here:

http://michael.dipperstein.com/lzss/#download

Enjoy! :D

Derek

Nice find! All I kept on getting with google was the simple v1.0 lzss.zip (though I didn't try it yet)!

I will tinker with this to see if I can get this algorithm to work.

delphidb96
04-17-2008, 11:51 AM
Nice find! All I kept on getting with google was the simple v1.0 lzss.zip (though I didn't try it yet)!

I will tinker with this to see if I can get this algorithm to work.

Please do as I've got a ton of .imps I want to convert for my Cybook! :D

Derek

Roberts324
04-18-2008, 08:08 AM
Nice find! All I kept on getting with google was the simple v1.0 lzss.zip (though I didn't try it yet)!

I will tinker with this to see if I can get this algorithm to work.

That would be nice, indeed!

And a new dent to your knife's handle...:thanks:

nrapallo
05-16-2008, 09:13 AM
:party4: More to follow very soon... :yahoo:

See post #1 (http://www.mobileread.com/forums/showthread.php?p=171069#post171069) above for ALL the juicy details!

GeoffC
05-16-2008, 10:13 AM
:party4: More to follow very soon... :yahoo:

Forgive the intrusion , but what's an .imp file ?

HarryT
05-16-2008, 10:19 AM
The book format used by the EB1150 bookreader (the subject of this forum section).

zelda_pinwheel
05-16-2008, 10:20 AM
nick, you never cease to amaze me. i don't know what we would do without you. karma for you !

GeoffC
05-16-2008, 10:22 AM
The book format used by the EB1150 bookreader (the subject of this forum section).

Thanks ..

delphidb96
05-16-2008, 10:57 AM
This was a response posted in another forum about converting .imp to .prc (or anything useful!)

snipped text removed

Still trying...

p.s. I've added the unimp.exe program as well as a reimp (sbtest.exe) program. Just drag and drop your file onto the .exe name in Windows Explorer (or a shortcut icon you've created on your desktop).

Nick,

How's the program coming? I'd love to know if you've gotten further along. Could you post the updated versions?

Derek

nrapallo
05-16-2008, 11:17 AM
nick, you never cease to amaze me. i don't know what we would do without you. karma for you !

:o Thanks! :2thumbsup

Nick,

How's the program coming? I'd love to know if you've gotten further along. Could you post the updated versions?

Derek

It's all here now! See post #1 (http://www.mobileread.com/forums/showthread.php?p=171069#post171069) above. Thanks for waiting... ;)

Another fine crafted software... :rolleyes: What do you think?

DixieGal
05-16-2008, 11:23 AM
YAY! I'm bookmarking this thread and printing a hard copy, so as to never lose this info! THANK YOU!
:thanks::thanks::thanks::thanks::thanks:

nrapallo
05-16-2008, 12:20 PM
YAY! I'm bookmarking this thread and printing a hard copy, so as to never lose this info! THANK YOU!
:thanks::thanks::thanks::thanks::thanks:

And hopefully with ETI's new entry into the e-ink reader world, you may never have to use this program to migrate away... :2thumbsup

I'm here (meaning .imp world) to stay!

nrapallo
05-16-2008, 02:20 PM
Added ability to extract only one .imp file.

See post #1 above and use the 'extract.bat.txt' attachment (just save as 'extract.bat') and in the MS-DOS command prompt window, type:

1. unimp "Impfilename.imp" and

2. extract "Resdirname" (note no .RES and Resdirname may differ from Impfilename)

delphidb96
05-16-2008, 03:19 PM
:o Thanks! :2thumbsup



It's all here now! See post #1 (http://www.mobileread.com/forums/showthread.php?p=171069#post171069) above. Thanks for waiting... ;)

Another fine crafted software... :rolleyes: What do you think?

Coolness, like SO TOTALLY *IS*!!! I'm sending you Karma!

Derek

JSWolf
05-16-2008, 03:21 PM
Is the goal to be able to extract the contents of the IMP file with the full formatting and graphics for easy conversion to another format of our choice?

nrapallo
05-16-2008, 04:44 PM
Is the goal to be able to extract the contents of the IMP file with the full formatting and graphics for easy conversion to another format of our choice?

For now, the goal was to be able to at least salvage the text portion of a .imp if the user "leaves" the .imp platform behind.

I doubt that I will be able to get the original html "look" since it is not stored in the .imp file. Only the basic components are, like a record that tells you where all the font/styles changes are located in the file, another record indicates where to end the line so that it doesn't spill over the screen size of that .imp and other records that stores the images, hyperlinks used, etc.

Basically all the building blocks are there (scattered) and it would not be trivial to reassemble the .html! Possible, but would be like reverse-engineering everything.

This is the next best thing for now.

Later, I wouldn't mind doing something along the lines of an Impperl suite of programs ala Mobiperl. Until now no programs have been able to extract any information from .imp files, so this is the begininng of the journey....

LeserattePD
05-17-2008, 04:36 AM
It's really wonderful that you've managed that! Now I might actually get a new e-ink reader. I love my ebookwise but I cracked the screen and the battery is giving up after 3 years of intense use. Living in the UK, I'd have to get a new one, so I'm thinking of switching to e-ink.

Any idea of a timeline for the new ebooktechnologies reader? At the moment I'm actually thinking of getting the Iliad (for the pdf support as I'm an academic) and because it seems closest to the ebookwise in other functions.

:2thumbsup

mscott161
12-17-2008, 11:40 AM
Nick,

I know you are working on the conversion of imp to other formats. I have been working on the same goal. I have viewed the source code for the LZSS compression but have not been successful in tweeking it to decompress the text from the imp. Can you give me and insite? I would be willing to same source code in the endever. My code is currently in c# and working toward a imp to lrf converter.

Michael

nrapallo
12-17-2008, 01:40 PM
Nick,

I know you are working on the conversion of imp to other formats. I have been working on the same goal. I have viewed the source code for the LZSS compression but have not been successful in tweeking it to decompress the text from the imp. Can you give me and insite? I would be willing to same source code in the endever. My code is currently in c# and working toward a imp to lrf converter.

Michael

Nice to hear of your .imp interest, as I'm a big supporter of this format! I would welcome another programmer on the .imp band-wagon.

The LZSS decompressor routine has already been re-written (and working) in Perl. It's been incorporated into EBook-Tools by mobileread.com member AZed by way of a LZSS.pm. It currently is only available in trunk, but should be released soon in version 4.0.

If you would like to PM me with your email address, I can help you with the C source code of same, but since it's a hack, I'm relucant (embarrassed) to post it here.

We (AZed and myself) have a lot more "goodies" in the EBook-Tools bag, like metadata editing and .html and images extraction. So stay-tuned!

I had previoulsy reverse-engineered the .imp format (as did ashkulz), but had no way of implementing what I had learnt via code. Now that I'm involved with EBook-Tools I forsee a lot of re-writing of existing programs to do things the EBook-Tools way. Perhaps you can use this for your conversions as well! :thumbsup:

mscott161
12-17-2008, 02:49 PM
Nick,

My email is michaels@ebizsoft.com. The problem I have is on the encoding bit. I read a offset then the length, but the numbers are pointing to bytes that have nothing in them. I am also using a byte array to hold the compressed data and using it as the incoming file and output is to a string that at the end I write to a file to check. I have parsed the PNG files, !!sw, confused on the values for !!cm. I am assuming the default 64 resource. I have been working on the style and extended style data to parse. All is in objects so I can hopefully at the end put together other formats.

Michael

nrapallo
12-17-2008, 03:20 PM
Nick,

My email is michaels@ebizsoft.com. The problem I have is on the encoding bit. I read a offset then the length, but the numbers are pointing to bytes that have nothing in them. I am also using a byte array to hold the compressed data and using it as the incoming file and output is to a string that at the end I write to a file to check. I have parsed the PNG files, !!sw, confused on the values for !!cm. I am assuming the default 64 resource. I have been working on the style and extended style data to parse. All is in objects so I can hopefully at the end put together other formats.

Michael

Michael:

I just sent you my source for deimp.exe and imp_dump.pl which parses a .imp into it's component filetype records and extracts the images and text.

I have not yet parsed !!cm (didn't need to just yet as deimp is used), but have parsed !!sw (which I think is useless/nonfunctional?). I have a lot of "notes" on this, but have not yet coded (in Perl) all of the .RES filetypes.

If you are interested in further reverse-engineering of the .imp format, then I think a new thread is in order. I know quite of few will join in! The main information on .imp reverse-engineering comes from the Jeffrey Kraus-yao's excellent website http://krausyaoj.tripod.com/reb1200.htm (or on the REB1200 for Dummies website http://www.chromakinetics.com/REB1200/imp_format.htm) .

RikaStrom
04-23-2010, 08:31 PM
And hopefully with ETI's new entry into the e-ink reader world, you may never have to use this program to migrate away... :2thumbsup

I'm here (meaning .imp world) to stay!

Nick,

Another question please. What is an alternative to the ebookwise ETI device? Are they going to come out with an e-ink device that will read my .imps and other formats?

I know it's in not in the works that the powers that be would let us have one or multiple devices that would read all the formats. (sigh)

Thank you

nrapallo
04-24-2010, 12:07 AM
Nick,

Another question please. What is an alternative to the ebookwise ETI device? Are they going to come out with an e-ink device that will read my .imps and other formats?

ETI did have an idea, long-long-ago (about 3-4 years ago), to produce an e-ink device (see their e-ink prototype (http://www.ebooktechnologies.com/toureinkproto.htm) showing what their ebook operating system would look like thereon). This "idea" has failed to become a reality and as such I do not foresee any other future device being able to read the .imp format.

I know it's in not in the works that the powers that be would let us have one or multiple devices that would read all the formats. (sigh)

Thank you

Nice idea, akin to "world peace and harmony"... :rolleyes: