12-04-2008, 11:42 AM | #1 |
Reticulator of Tharn
Posts: 618
Karma: 400000
Join Date: Jan 2007
Location: EST
Device: Sony PRS-505
|
LIT generation -- binary analysis help with the last %0.1?
After following a comment thread about e-book formats over at tor.com last week, I spontaneously decided to apply what I'd learned from writing lit2oeb for calibre to writing an "oeb2lit" open-source, free-as-in-freedom LIT-generation tool.
I need to neaten up the code a bit, but I'm basically 99.9% percent there, creating LIT files with all of LIT's baroque cross-indices in place and which MSReader treats indistinguishably from commercially-generated LIT files. The ITOL/ITLS archive format is fairly well-documented (Microsoft's ITOL/ITLS format plus the ConvertLIT source code), but I had to figure out the exact format of most of the LIT-specific files from scratch. The 0.1% I haven't figured out yet is a hash function. For each HTML file in the LIT book, the LIT archive contains three entries: a "content" entry containing the actual markup, an "ahc" entry containing the hash-collision table for a hash of all the referenced fragment identifiers in the markup, and an "aht" entry containing a fixed-record-width index into the collision-table. I'm able to fake the "ahc" and "aht" entries by just always creating a collision table of length 1, but I'm not sure the WinCE version of MSReader will handle that cleanly, so it'd be nice to create these structures correctly. I'm terrible at binary analysis, but would any of those here who are so-skilled be interested in taking a peek at either MSReader or the MS-provided LIT-generation SDK to figure out the hash function used? Thanks! :-) -Marshall |
12-04-2008, 11:52 AM | #2 |
Resident Curmudgeon
Posts: 73,966
Karma: 128903250
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
So once this is done, can we have lrf2lit so we can then use lit2oeb to get the LRF into HTML?
|
Advert | |
|
12-04-2008, 12:40 PM | #3 |
creator of calibre
Posts: 43,856
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Cool, I look forward to lit2oeb, it should allow adding LIT as an output format to calibre. Unfortunately, I suck at binary analysis as well.
|
12-04-2008, 12:42 PM | #4 | |
Reticulator of Tharn
Posts: 618
Karma: 400000
Join Date: Jan 2007
Location: EST
Device: Sony PRS-505
|
Quote:
Do you have many LRF books you can't get in other formats? |
|
12-06-2008, 02:33 AM | #5 |
Wizard
Posts: 3,442
Karma: 300001
Join Date: Sep 2006
Location: Belgium
Device: PRS-500/505/700, Kindle, Cybook Gen3, Words Gear
|
I'll have a look at how RMR generates the hash. BTW, you can check how your files are read by MS Reader on Pocket PCs with Device Emulator.
|
Advert | |
|
12-06-2008, 12:56 PM | #6 | |
Reticulator of Tharn
Posts: 618
Karma: 400000
Join Date: Jan 2007
Location: EST
Device: Sony PRS-505
|
Quote:
I'm actually not sure that MRM generates the anchor hashtables, or at least they've been empty in all the MRM-created LIT files I've seen. OTOH, I'm sure it just links to the same SDK DLL all the other Microsoft-derived converters use. And thanks for the pointer to the emulator. I assumed any such beast would require at least some sort of WinCE license, but I guess I made an "as" out of Sue and "m." |
|
12-10-2008, 11:45 AM | #7 |
Resident Curmudgeon
Posts: 73,966
Karma: 128903250
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
|
12-10-2008, 11:48 AM | #8 | |
Resident Curmudgeon
Posts: 73,966
Karma: 128903250
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
Quote:
Also if I ever come across LRF that I feel needs tweaking in some way, I can convert, do the fixing up, and convert back. |
|
12-10-2008, 04:50 PM | #9 | |
Wizard
Posts: 3,671
Karma: 12205348
Join Date: Mar 2008
Device: Galaxy S, Nook w/CM7
|
Hi Marshall,
A LIT conversion tool is sorely lacking in our community. This will be a great addition. It's ironic that all conversions support converting from LIT to X but nothing existed to create a LIT file. Well there is WordRMR but this requires windows and MS Word. Would it be possible for you to also create an html2lit tool? It shouldn't be much different from oeb2lit? Quote:
=X= |
|
12-10-2008, 05:06 PM | #10 | |
Reticulator of Tharn
Posts: 618
Karma: 400000
Join Date: Jan 2007
Location: EST
Device: Sony PRS-505
|
Quote:
Thanks! Igorsk pointed me at that PocketPC emulator, but testing it on a real device too would probably be a good idea -- I'll toss a book at you after I have a chance to make sure everything seems to work in the emulator. |
|
12-12-2008, 04:58 PM | #11 |
reader
Posts: 6,975
Karma: 5183568
Join Date: Mar 2006
Location: Mississippi, USA
Device: Kindle 3, Kobo Glo HD
|
Will this work for ePub? Calibre does not currently have an ePub to OEB capability. In particular, the ePub TOC needs to be converted to an OPF guide. If it had one, then ePub to MOBI would also be simpler. MobiPocket's own MOBI tools don't always work well when importing ePubs, but they work very well on OEBs.
|
12-12-2008, 06:16 PM | #12 | |
Reticulator of Tharn
Posts: 618
Karma: 400000
Join Date: Jan 2007
Location: EST
Device: Sony PRS-505
|
Quote:
In any case -- yes, oeb2lit consumes either OPF 1.x or EPUB's OPF 2.0 and emits "compliant-enough" OPF 1.2. (My generator is more strict that MSReader actually requires, but I don't strip attributes the OPF 1.2 DTD doesn't allow but exist in namespaces OPF 2.0 doesn't care about.) Converting OPS 2.0 content to the subset of OPS 1.x MSReader understands completely won't be in the first release, alas, but I'm planning to add it soon after. |
|
12-13-2008, 05:23 AM | #13 | |
eBook Enthusiast
Posts: 85,544
Karma: 93383043
Join Date: Nov 2006
Location: UK
Device: Kindle Oasis 2, iPad Pro 10.5", iPhone 6
|
Quote:
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
iPad Yahoo: Apple iPad User Analysis | kjk | Apple Devices | 13 | 07-10-2010 02:53 PM |
Hilarious Paper vs Ebook analysis | notyou | General Discussions | 2 | 06-28-2010 04:39 PM |
Display not working (and some basic economic analysis) | timbp | HanLin eBook | 2 | 03-15-2010 05:04 AM |
Text Analysis & Paragraph Detection | ahi | Workshop | 15 | 09-14-2009 11:28 PM |
Analysis of the De Tijd-project | TadW | News | 1 | 04-17-2007 05:13 PM |