GuteBook/Mobi2IMP Creator
Posts: 2,958
Karma: 2530691
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200 EBW1150 Device: T1 NSTG iLiad_v2 NC Device: Asus_TF Next1 WPDN
|
Extracting html/images from within .imp files!
This is a follow-up to the thread Converting .IMP to anything? WE ARE NOW THERE!
I'm presently trying to reverse engineer the .imp format into hard (perl) code and will soon release some perl code to dump the entire contents of a .imp file. This will result in the exploding of the .imp into it's component parts (html styles/links and images used therein).
Since I've already managed to decompress the 'raw text' stored in the .imp (using 'deimp.exe'), the next goal would be for imp_dump.pl to merge this (decompressed) text with the html styles/links and images.
And further to a recent inquiry by 'vinicius0881', I am also going to explore how exactly the markups on the ebookwise could be transferred from the Smartmedia card to the PC. I haven't done any real testing of this yet (even if it is possible). EDIT: This is quite possible and easy to do, see thread Extracting markups (annotations and highlites) from your ebook!
The perl code is still evolving and is lacking the decoding of a lot of detailed information, but for now it at least extracts the header and index info of each of the component .RES files as well as the .imp header info. The REB 1200 .imp and EBW 1150 .imp formats differ slightly which requires two different extractions depending on the imp type. This is slowing things down for me!
I tried it out on the REBTestDocument.imp 1150 attachment and got the following output:
Code:
imp_dump.pl (version 0.1) Copyright (C) 2008 Nick Rapallo (nrapallo)
=======================
Imp Filename:REBTestDocument.imp
=======================
Version:2, "BOOKDOUG", CountRESFiles:32, LengthRESdirname:14, CountRemain:118
Compression:1, Encryption:0, ImpType:2, ZoomState:0
ID:REBTestDocument-10dec02-0945
Category:Content Creation, SubCat(not used):outPages=71&inPages=83
Title:REB Test Document
LName:, MName:, AuthorFNname:Ludo
RESdirname:REBtestdoc.RES
Book_Prop_length:87
Offset to .RES Table of Contents:149
Filename:BEDO, Filesize: 116352, Filetype:JPEG
Filename:BYVI, Filesize: 656, Filetype:ImRn
File:DATA.FRK, Filesize: 13694, Filetype:
Filename:DCXK, Filesize: 494, Filetype:HfPZ
Filename:DELY, Filesize: 550, Filetype:Tabl
Filename:DKRU, Filesize: 828, Filetype:Pcz0
Filename:DMNC, Filesize: 828, Filetype:PcZ0
Filename:FEXY, Filesize: 54, Filetype:BGcl
Filename:FONU, Filesize: 132, Filetype:ESts
Filename:FQTG, Filesize: 3822, Filetype:PcZ1
Filename:FUXW, Filesize: 25918, Filetype:Styl
Filename:HMNO, Filesize: 80, Filetype:pInf
Filename:HYLQ, Filesize: 494, Filetype:HfPz
Filename:JCZC, Filesize: 32, Filetype:Pc31
Filename:JIFU, Filesize: 90, Filetype:!!cm
Filename:MNYJ, Filesize: 1474, Filetype:!!sw
Filename:NCZG, Filesize: 3954, Filetype:Pcz1
Filename:PQJA, Filesize: 14420, Filetype:BPgZ
Filename:RENM, Filesize: 8246, Filetype:StRn
Filename:RGLS, Filesize: 64, Filetype:Mrgn
Filename:RYJS, Filesize: 3977, Filetype:PNG
Filename:RYXI, Filesize: 612, Filetype:AncT
Filename:TENW, Filesize: 32, Filetype:StR2
Filename:TEPA, Filesize: 1619, Filetype:TRow
Filename:TEPM, Filesize: 12857, Filetype:BPgz
Filename:TGBQ, Filesize: 2154, Filetype:GIF
Filename:VMBY, Filesize: 9093, Filetype:TCel
Filename:VQNM, Filesize: 156, Filetype:Hyp2
Filename:XITE, Filesize: 1270, Filetype:Lnks
Filename:XUFW, Filesize: 106, Filetype:HRle
Filename:ZQVA, Filesize: 96, Filetype:PPic
Filename:ZUZS, Filesize: 47, Filetype:Devm
======== JPEG ========
Filename:BEDO, Filesize: 116352, Filetype:JPEG
******** A new filetype encountered!!!
Header:TOCconst:01, TOCfname:JPEG, TOCoffset:116268
======== ImRn ========
Filename:BYVI, Filesize: 656, Filetype:ImRn
Header:TOCconst:01, TOCfname:ImRn, TOCoffset:642
Number of images indexed = 17
width:153, height: 61, constB:FFFB, offset: 1527, imgtype: FIG, imgID:0080
width:153, height: 61, constB:FFFB, offset: 14851, imgtype: FIG, imgID:0080
width:153, height: 61, constB:FFFB, offset: 14888, imgtype: FIG, imgID:0080
width:153, height: 61, constB:FFFB, offset: 14924, imgtype: FIG, imgID:0080
width:153, height: 61, constB:FFFC, offset: 15012, imgtype: GNP, imgID:0080
width:472, height:595, constB:FFFA, offset: 17845, imgtype:GEPJ, imgID:0080
width:472, height:595, constB:FFFB, offset: 17847, imgtype:GEPJ, imgID:8D4D
width:153, height: 61, constB:FFFB, offset: 17898, imgtype: GNP, imgID:0080
width:153, height: 61, constB:FFFF, offset: 18201, imgtype: GNP, imgID:0080
width:153, height: 61, constB:FFFE, offset: 18512, imgtype: GNP, imgID:0080
width:153, height: 61, constB:FFFC, offset: 18823, imgtype: GNP, imgID:0080
width:153, height: 61, constB:FFFE, offset: 19156, imgtype: GNP, imgID:0080
width:153, height: 61, constB:FFFE, offset: 19485, imgtype: GNP, imgID:0080
width:176, height:207, constB:FFFB, offset: 19949, imgtype:GEPJ, imgID:4D80
width:174, height:207, constB:FFFB, offset: 19951, imgtype:GEPJ, imgID:B7B8
width:176, height:212, constB:FFFB, offset: 19953, imgtype:GEPJ, imgID:8A1A
width:174, height:212, constB:FFFC, offset: 19955, imgtype:GEPJ, imgID:0F4E
Index1:Index1_const1:00, Index1_len:610, Index1_offset:32, Index1_const0:00
======== ========
File:DATA.FRK, Filesize: 13694, Filetype:
Extracting compressed DATA.FRK to "REBTestDocument.imp.txt" (13694 chars)
======== HfPZ ========
Filename:DCXK, Filesize: 494, Filetype:HfPZ
Header:TOCconst:01, TOCfname:HfPZ, TOCoffset:480
======== Tabl ========
Filename:DELY, Filesize: 550, Filetype:Tabl
Header:TOCconst:01, TOCfname:Tabl, TOCoffset:536
======== Pcz0 ========
Filename:DKRU, Filesize: 828, Filetype:Pcz0
Header:TOCconst:01, TOCfname:Pcz0, TOCoffset:814
======== PcZ0 ========
Filename:DMNC, Filesize: 828, Filetype:PcZ0
Header:TOCconst:01, TOCfname:PcZ0, TOCoffset:814
======== BGcl ========
Filename:FEXY, Filesize: 54, Filetype:BGcl
Header:TOCconst:01, TOCfname:BGcl, TOCoffset:40
BGcl_const1:FFFF, Red:FF (FF), Green:FF (FF), Blue:FF (FF)
Index1:Index1_const1:80, Index1_len:8, Index1_offset:32, Index1_const0:0000
======== ESts ========
Filename:FONU, Filesize: 132, Filetype:ESts
Header:TOCconst:01, TOCfname:ESts, TOCoffset:76
======== PcZ1 ========
Filename:FQTG, Filesize: 3822, Filetype:PcZ1
Header:TOCconst:02, TOCfname:PcZ1, TOCoffset:3714
======== Styl ========
Filename:FUXW, Filesize: 25918, Filetype:Styl
Header:TOCconst:01, TOCfname:Styl, TOCoffset:25904
======== pInf ========
Filename:HMNO, Filesize: 80, Filetype:pInf
Header:TOCconst:01, TOCfname:pInf, TOCoffset:52
======== HfPz ========
Filename:HYLQ, Filesize: 494, Filetype:HfPz
Header:TOCconst:01, TOCfname:HfPz, TOCoffset:480
======== Pc31 ========
Filename:JCZC, Filesize: 32, Filetype:Pc31
Header:TOCconst:02, TOCfname:Pc31, TOCoffset:32
======== !!cm ========
Filename:JIFU, Filesize: 90, Filetype:!!cm
Header:TOCconst:01, TOCfname:!!cm, TOCoffset:62
======== !!sw ========
Filename:MNYJ, Filesize: 1474, Filetype:!!sw
Header:TOCconst:01, TOCfname:!!sw, TOCoffset:736
sw_length:704
sw_record:
0001 0002 0012 0001 0002 0012 0001 0003
0003 0003 0003 0001 000C 0002 000B 0001
0002 0002 0002 0002 0002 0002 0002 0002
0001 0003 0003 0003 0003 0001 0002 0002
0002 0003 000C 0002 0002 0002 0002 000B
0001 0003 000D 0003 0013 0005 000B 0001
0003 000D 0003 000B 0001 000C 0003 000B
0001 000C 0003 000B 0001 0003 0003 0003
0003 0003 0003 0003 0003 0003 0003 0003
0003 0013 0005 0013 0005 0013 0005 0013
0005 0001 0003 0003 0003 0013 0001 0002
0002 0002 0002 0001 000C 0003 000B 0001
0001 0001 0001 0001 000D 0002 0002 0002
0002 000B 000D 0003 000B 000D 0003 0003
000B 0012 0001 0001 0001 0001 0001 000D
0002 0002 0002 0002 000B 000D 0003 000B
000D 0003 0003 000B 0012 0001 0001 0001
0001 0001 000D 0002 0002 0002 0002 000B
000D 0003 000B 000D 0003 0003 000B 0012
0001 0001 0001 0001 0001 000D 0002 0002
0002 0002 000B 000D 0003 000B 000D 0003
0003 000B 0012 0001 0003 0001 000C 0003
000B 0001 0003 000C 0002 000B 0001 0002
0001 000C 0003 000B 0001 000D 0003 0003
000B 0001 0002 0002 0003 0002 0001 0002
0003 0003 0003 0003 0003 0003 0012 0001
000C 0003 000B 0001 000E 0008 000B 0001
000C 0003 000B 0001 0002 0002 0002 0002
0002 0002 0002 0001 0001 0001 0001 0001
0001 0002 0002 0002 0002 0002 0002 0003
0008 0001 000D 0008 0002 0007 0004 000B
0001 000C 000A 0002 0002 0002 0002 0002
0003 0003 0003 0002 0010 000B 0001 000C
0003 0003 0003 0003 0003 0003 0003 0002
0003 000B 0001 000D 0003 000B 0001 000C
0002 0002 0002 0002 0002 0003 0003 0002
0003 000B 0001 000C 0002 0002 0003 0003
0003 000B 0001 000C 0001 0001 0002 0002
0002 0002 0002 0002 0003 0003 0003 000B
0001 000C 0002 0002 0002 0001 0004 0003
000B 0001 000C 0002 0002 0002 0002 0002
0002 0002 0001 0001 0001 0001 0001 0001
0002 0002 0002 0002 0002 0002 0002 0003
0003 0003 000B 0001 000C 0003 0003 000B
Index01:seqnum:140, len: 6, offset: 32, const4:04, filetype:AtTp
Index02:seqnum:139, len: 6, offset: 38, const4:04, filetype:SKtb
Index03:seqnum:168, len: 10, offset: 44, const4:04, filetype:stbd
Index04:seqnum:167, len: 8, offset: 54, const4:04, filetype:fnts
Index05:seqnum:166, len: 18, offset: 62, const4:04, filetype:bInf
Index06:seqnum:165, len: 10, offset: 80, const4:04, filetype:batr
Index07:seqnum:164, len: 22, offset: 90, const4:04, filetype:SMnu
Index08:seqnum:163, len: 14, offset: 112, const4:04, filetype:FRgs
Index09:seqnum:162, len: 10, offset: 126, const4:04, filetype:FRDt
Index10:seqnum:161, len: 8, offset: 136, const4:04, filetype:Form
Index11:seqnum:160, len: 8, offset: 144, const4:04, filetype:FItm
Index12:seqnum:159, len: 42, offset: 152, const4:04, filetype:FIDt
Index13:seqnum:158, len: 10, offset: 194, const4:04, filetype:FrDt
Index14:seqnum:157, len: 10, offset: 204, const4:04, filetype:BGcl
Index15:seqnum:156, len: 8, offset: 214, const4:04, filetype:Hyp2
Index16:seqnum:155, len: 38, offset: 222, const4:04, filetype:HfPZ
Index17:seqnum:154, len: 38, offset: 260, const4:04, filetype:HfPz
Index18:seqnum:153, len: 38, offset: 298, const4:04, filetype:BPgZ
Index19:seqnum:152, len: 38, offset: 336, const4:04, filetype:BPgz
Index20:seqnum:151, len: 4, offset: 374, const4:04, filetype:MRPs
Index21:seqnum:150, len: 8, offset: 378, const4:04, filetype:Dire
Index22:seqnum:149, len: 10, offset: 386, const4:04, filetype:MASK
Index23:seqnum:148, len: 4, offset: 396, const4:04, filetype:Dict
Index24:seqnum:147, len: 8, offset: 400, const4:04, filetype:Hyph
Index25:seqnum:146, len: 10, offset: 408, const4:04, filetype:AncT
Index26:seqnum:145, len: 10, offset: 418, const4:04, filetype:BPos
Index27:seqnum:144, len: 18, offset: 428, const4:04, filetype:PICT
Index28:seqnum:143, len: 8, offset: 446, const4:04, filetype:StR2
Index29:seqnum:142, len: 8, offset: 454, const4:04, filetype:STR#
Index30:seqnum:141, len: 8, offset: 462, const4:04, filetype:Clos
Index31:seqnum:138, len: 44, offset: 470, const4:04, filetype:TagS
Index32:seqnum:137, len: 14, offset: 514, const4:04, filetype:Glos
Index33:seqnum:136, len: 28, offset: 528, const4:04, filetype:ImRn
Index34:seqnum:135, len: 24, offset: 556, const4:04, filetype:Lnks
Index35:seqnum:134, len: 8, offset: 580, const4:04, filetype:Offs
Index36:seqnum:133, len: 24, offset: 588, const4:04, filetype:Tabl
Index37:seqnum:132, len: 16, offset: 612, const4:04, filetype:TRow
Index38:seqnum:131, len: 28, offset: 628, const4:04, filetype:TCel
Index39:seqnum:130, len: 18, offset: 656, const4:04, filetype:HRle
Index40:seqnum:129, len: 52, offset: 674, const4:04, filetype:Styl
Index41:seqnum:128, len: 10, offset: 726, const4:04, filetype:StRn
======== Pcz1 ========
Filename:NCZG, Filesize: 3954, Filetype:Pcz1
Header:TOCconst:02, TOCfname:Pcz1, TOCoffset:3846
======== BPgZ ========
Filename:PQJA, Filesize: 14420, Filetype:BPgZ
Header:TOCconst:02, TOCfname:BPgZ, TOCoffset:14384
======== StRn ========
Filename:RENM, Filesize: 8246, Filetype:StRn
Header:TOCconst:01, TOCfname:StRn, TOCoffset:8232
======== Mrgn ========
Filename:RGLS, Filesize: 64, Filetype:Mrgn
Header:TOCconst:01, TOCfname:Mrgn, TOCoffset:36
Mrgn1:FFFF, Mrgn2:FFFF
Index1:Index1_const1:81, Index1_len:2, Index1_offset:32, Index1_const0:0000
Index2:Index2_const1:80, Index2_len:2, Index2_offset:34, Index2_const0:0000
======== PNG ========
Filename:RYJS, Filesize: 3977, Filetype:PNG
Header:TOCconst:01, TOCfname:PNG , TOCoffset:3963
Index1:Index1_const1:00, Index1_len:3931, Index1_offset:32, Index1_const0:00
======== AncT ========
Filename:RYXI, Filesize: 612, Filetype:AncT
Header:TOCconst:01, TOCfname:AncT, TOCoffset:584
======== StR2 ========
Filename:TENW, Filesize: 32, Filetype:StR2
Header:TOCconst:02, TOCfname:StR2, TOCoffset:32
======== TRow ========
Filename:TEPA, Filesize: 1619, Filetype:TRow
Header:TOCconst:02, TOCfname:TRow, TOCoffset:1295
======== BPgz ========
Filename:TEPM, Filesize: 12857, Filetype:BPgz
Header:TOCconst:02, TOCfname:BPgz, TOCoffset:12821
======== GIF ========
Filename:TGBQ, Filesize: 2154, Filetype:GIF
Header:TOCconst:01, TOCfname:GIF , TOCoffset:2140
Index1:Index1_const1:00, Index1_len:2108, Index1_offset:32, Index1_const0:00
======== TCel ========
Filename:VMBY, Filesize: 9093, Filetype:TCel
Header:TOCconst:02, TOCfname:TCel, TOCoffset:6645
======== Hyp2 ========
Filename:VQNM, Filesize: 156, Filetype:Hyp2
Header:TOCconst:01, TOCfname:Hyp2, TOCoffset:128
======== Lnks ========
Filename:XITE, Filesize: 1270, Filetype:Lnks
Header:TOCconst:01, TOCfname:Lnks, TOCoffset:1256
======== HRle ========
Filename:XUFW, Filesize: 106, Filetype:HRle
Header:TOCconst:01, TOCfname:HRle, TOCoffset:92
======== PPic ========
Filename:ZQVA, Filesize: 96, Filetype:PPic
Header:TOCconst:01, TOCfname:PPic, TOCoffset:68
======== Devm ========
Filename:ZUZS, Filesize: 47, Filetype:Devm
Header:TOCconst:01, TOCfname:Devm, TOCoffset:33
Check below for the output text of both the EBW 1150 and the REB 1200 above .imp attachments.
More to follow... Isn't hacking/exploring fun? Seek and you shall explore...
Last edited by nrapallo; 06-15-2008 at 08:05 AM.
Reason: added link to sucessful markup extraction procedure thread
|