Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Formats > Workshop

Notices

Reply
 
Thread Tools Search this Thread
Old 12-04-2008, 11:42 AM   #1
llasram
Reticulator of Tharn
llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.
 
llasram's Avatar
 
Posts: 622
Karma: 400000
Join Date: Jan 2007
Location: EST
Device: Sony PRS-505
LIT generation -- binary analysis help with the last %0.1?

After following a comment thread about e-book formats over at tor.com last week, I spontaneously decided to apply what I'd learned from writing lit2oeb for calibre to writing an "oeb2lit" open-source, free-as-in-freedom LIT-generation tool.

I need to neaten up the code a bit, but I'm basically 99.9% percent there, creating LIT files with all of LIT's baroque cross-indices in place and which MSReader treats indistinguishably from commercially-generated LIT files. The ITOL/ITLS archive format is fairly well-documented (Microsoft's ITOL/ITLS format plus the ConvertLIT source code), but I had to figure out the exact format of most of the LIT-specific files from scratch.

The 0.1% I haven't figured out yet is a hash function. For each HTML file in the LIT book, the LIT archive contains three entries: a "content" entry containing the actual markup, an "ahc" entry containing the hash-collision table for a hash of all the referenced fragment identifiers in the markup, and an "aht" entry containing a fixed-record-width index into the collision-table. I'm able to fake the "ahc" and "aht" entries by just always creating a collision table of length 1, but I'm not sure the WinCE version of MSReader will handle that cleanly, so it'd be nice to create these structures correctly.

I'm terrible at binary analysis, but would any of those here who are so-skilled be interested in taking a peek at either MSReader or the MS-provided LIT-generation SDK to figure out the hash function used?

Thanks! :-)

-Marshall
llasram is offline   Reply With Quote
Old 12-04-2008, 11:52 AM   #2
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 37,087
Karma: 18147936
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Sony Reader PRS-650, iPad, nook STR
So once this is done, can we have lrf2lit so we can then use lit2oeb to get the LRF into HTML?
JSWolf is offline   Reply With Quote
 
Enthusiast
Old 12-04-2008, 12:40 PM   #3
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 25,948
Karma: 5036099
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Cool, I look forward to lit2oeb, it should allow adding LIT as an output format to calibre. Unfortunately, I suck at binary analysis as well.
kovidgoyal is online now   Reply With Quote
Old 12-04-2008, 12:42 PM   #4
llasram
Reticulator of Tharn
llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.
 
llasram's Avatar
 
Posts: 622
Karma: 400000
Join Date: Jan 2007
Location: EST
Device: Sony PRS-505
Quote:
Originally Posted by JSWolf View Post
So once this is done, can we have lrf2lit so we can then use lit2oeb to get the LRF into HTML?
Well, the reason LIT files extract to OEB/HTML books so cleanly is that they are HTML in the first place, which means that to generate a LIT file you need HTML to start with. If you want LRF->HTML conversion you pretty much just need to go LRF->HTML.

Do you have many LRF books you can't get in other formats?
llasram is offline   Reply With Quote
Old 12-06-2008, 02:33 AM   #5
igorsk
Wizard
igorsk reads XML... blindfoldedigorsk reads XML... blindfoldedigorsk reads XML... blindfoldedigorsk reads XML... blindfoldedigorsk reads XML... blindfoldedigorsk reads XML... blindfoldedigorsk reads XML... blindfoldedigorsk reads XML... blindfoldedigorsk reads XML... blindfoldedigorsk reads XML... blindfoldedigorsk reads XML... blindfolded
 
Posts: 3,443
Karma: 52235
Join Date: Sep 2006
Location: Belgium
Device: PRS-500/505/700, Kindle, Cybook Gen3, Words Gear
I'll have a look at how RMR generates the hash. BTW, you can check how your files are read by MS Reader on Pocket PCs with Device Emulator.
igorsk is offline   Reply With Quote
Old 12-06-2008, 12:56 PM   #6
llasram
Reticulator of Tharn
llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.
 
llasram's Avatar
 
Posts: 622
Karma: 400000
Join Date: Jan 2007
Location: EST
Device: Sony PRS-505
Quote:
Originally Posted by igorsk View Post
I'll have a look at how RMR generates the hash. BTW, you can check how your files are read by MS Reader on Pocket PCs with Device Emulator.
Awesome! Thank you, igorsk. I was hoping you'd jump in :-).

I'm actually not sure that MRM generates the anchor hashtables, or at least they've been empty in all the MRM-created LIT files I've seen. OTOH, I'm sure it just links to the same SDK DLL all the other Microsoft-derived converters use.

And thanks for the pointer to the emulator. I assumed any such beast would require at least some sort of WinCE license, but I guess I made an "as" out of Sue and "m."
llasram is offline   Reply With Quote
Old 12-10-2008, 11:45 AM   #7
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 37,087
Karma: 18147936
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Sony Reader PRS-650, iPad, nook STR
Quote:
Originally Posted by kovidgoyal View Post
Cool, I look forward to lit2oeb, it should allow adding LIT as an output format to calibre. Unfortunately, I suck at binary analysis as well.
lit2oeb alread is part of Calibre. I think you mean oeb2lit.
JSWolf is offline   Reply With Quote
Old 12-10-2008, 11:48 AM   #8
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 37,087
Karma: 18147936
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Sony Reader PRS-650, iPad, nook STR
Quote:
Originally Posted by llasram View Post
Well, the reason LIT files extract to OEB/HTML books so cleanly is that they are HTML in the first place, which means that to generate a LIT file you need HTML to start with. If you want LRF->HTML conversion you pretty much just need to go LRF->HTML.

Do you have many LRF books you can't get in other formats?
Well there are some eBook posted on MR in LRF only and I think it would be nice if others could take those LRF and convert to other formats pretty easily.

Also if I ever come across LRF that I feel needs tweaking in some way, I can convert, do the fixing up, and convert back.
JSWolf is offline   Reply With Quote
Old 12-10-2008, 04:50 PM   #9
=X=
Wizard
=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.
 
=X='s Avatar
 
Posts: 3,672
Karma: 12205348
Join Date: Mar 2008
Device: Galaxy S, Nook w/CM7
Hi Marshall,
A LIT conversion tool is sorely lacking in our community. This will be a great addition. It's ironic that all conversions support converting from LIT to X but nothing existed to create a LIT file. Well there is WordRMR but this requires windows and MS Word.

Would it be possible for you to also create an html2lit tool? It shouldn't be much different from oeb2lit?

Quote:
Originally Posted by llasram View Post
I'm able to fake the "ahc" and "aht" entries by just always creating a collision table of length 1, but I'm not sure the WinCE version of MSReader will handle that cleanly, so it'd be nice to create these structures correctly.
If you need any help testing WinCE LIT files let me know I have a WinCE phone with MSReader installed.

=X=
=X= is offline   Reply With Quote
Old 12-10-2008, 05:06 PM   #10
llasram
Reticulator of Tharn
llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.
 
llasram's Avatar
 
Posts: 622
Karma: 400000
Join Date: Jan 2007
Location: EST
Device: Sony PRS-505
Quote:
Originally Posted by =X= View Post
Would it be possible for you to also create an html2lit tool? It shouldn't be much different from oeb2lit?
Actually, there will be an `any2lit` which will work like `any2epub` -- automatically convert anything calibre can convert to OEB to an OEB then convert to LIT.

Quote:
Originally Posted by =X= View Post
If you need any help testing WinCE LIT files let me know I have a WinCE phone with MSReader installed.
Thanks! Igorsk pointed me at that PocketPC emulator, but testing it on a real device too would probably be a good idea -- I'll toss a book at you after I have a chance to make sure everything seems to work in the emulator.
llasram is offline   Reply With Quote
Old 12-12-2008, 04:58 PM   #11
wallcraft
reader
wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.
 
wallcraft's Avatar
 
Posts: 6,979
Karma: 5183568
Join Date: Mar 2006
Location: Mississippi, USA
Device: Kindle 3 and Fire
Quote:
Originally Posted by llasram View Post
Actually, there will be an `any2lit` which will work like `any2epub` -- automatically convert anything calibre can convert to OEB to an OEB then convert to LIT.
Will this work for ePub? Calibre does not currently have an ePub to OEB capability. In particular, the ePub TOC needs to be converted to an OPF guide. If it had one, then ePub to MOBI would also be simpler. MobiPocket's own MOBI tools don't always work well when importing ePubs, but they work very well on OEBs.
wallcraft is offline   Reply With Quote
Old 12-12-2008, 06:16 PM   #12
llasram
Reticulator of Tharn
llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.
 
llasram's Avatar
 
Posts: 622
Karma: 400000
Join Date: Jan 2007
Location: EST
Device: Sony PRS-505
Quote:
Originally Posted by wallcraft View Post
Will this work for ePub? Calibre does not currently have an ePub to OEB capability. In particular, the ePub TOC needs to be converted to an OPF guide. If it had one, then ePub to MOBI would also be simpler. MobiPocket's own MOBI tools don't always work well when importing ePubs, but they work very well on OEBs.
Well, OEB is a little vague. I usually use it to mean either OEBPS 1.x or OPF 2.0 + OPS 2.0, which seems consistent with the specs if not explicitly defined. Given that definition, your basic "EPUB to OEB" command is "unzip" :-).

In any case -- yes, oeb2lit consumes either OPF 1.x or EPUB's OPF 2.0 and emits "compliant-enough" OPF 1.2. (My generator is more strict that MSReader actually requires, but I don't strip attributes the OPF 1.2 DTD doesn't allow but exist in namespaces OPF 2.0 doesn't care about.) Converting OPS 2.0 content to the subset of OPS 1.x MSReader understands completely won't be in the first release, alas, but I'm planning to add it soon after.
llasram is offline   Reply With Quote
Old 12-13-2008, 05:23 AM   #13
HarryT
eBook Enthusiast
HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.
 
HarryT's Avatar
 
Posts: 63,488
Karma: 41548799
Join Date: Nov 2006
Location: UK
Device: PW2, iPad Retina Mini, iPhone 4, MS Surface Pro, Onyx T68, N7,
Quote:
Originally Posted by =X= View Post
Hi Marshall,
A LIT conversion tool is sorely lacking in our community. This will be a great addition. It's ironic that all conversions support converting from LIT to X but nothing existed to create a LIT file. Well there is WordRMR but this requires windows and MS Word.
The main reason for that, I think, is that relatively few of us are reading on devices which support LIT. I don't find Microsoft Reader to be an especially good reading application; when I was reading on platforms which supported it (eg Pocket PC) there were always alternate reading applications which I preferred to use.
HarryT is online now   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
iPad Yahoo: Apple iPad User Analysis kjk Apple Devices 13 07-10-2010 02:53 PM
Hilarious Paper vs Ebook analysis notyou General Discussions 2 06-28-2010 04:39 PM
Display not working (and some basic economic analysis) timbp HanLin eBook 2 03-15-2010 05:04 AM
Text Analysis & Paragraph Detection ahi Workshop 15 09-14-2009 11:28 PM
Analysis of the De Tijd-project TadW News 1 04-17-2007 05:13 PM


All times are GMT -4. The time now is 10:51 AM.


MobileRead.com is a privately owned, operated and funded community.