![]() |
#136 |
Avid reader
![]() ![]() Posts: 262
Karma: 132
Join Date: Mar 2005
Location: The Netherlands
Device: HTC Touch Diamond, iLiad Book Edition
|
I am curious actually if you wrote the decompression / decryption for the records after the EXTH / PDB record 0 yourself or if you let Perl do that for you? Or did Mobipocket make something special out of it?
Maybe you'll induldge me and also tell me how you decide the actual amount of PDB records needed to be decompressed for the content since the last three(?) pdb records clearly aren't part of the content itself. They're way too small for that. And.. why would they split the content in such small blocks of 2000 characters or less? Easier handling for small mobile devices? Last edited by Jaapjan; 01-10-2008 at 09:44 AM. Reason: More questions! |
![]() |
![]() |
![]() |
#137 | |||
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 7,452
Karma: 7185064
Join Date: Oct 2007
Location: Linköpng, Sweden
Device: Kindle Voyage, Nexus 5, Kindle PW
|
Quote:
![]() Quote:
Quote:
|
|||
![]() |
![]() |
![]() |
#138 | ||
Avid reader
![]() ![]() Posts: 262
Karma: 132
Join Date: Mar 2005
Location: The Netherlands
Device: HTC Touch Diamond, iLiad Book Edition
|
Quote:
Quote:
Perhaps memory constrained devices read these blocks in memory as sort of cache and move to the next and / or previous one only when needed. |
||
![]() |
![]() |
![]() |
#139 | |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 7,452
Karma: 7185064
Join Date: Oct 2007
Location: Linköpng, Sweden
Device: Kindle Voyage, Nexus 5, Kindle PW
|
Quote:
There is only one chunk of text that is compressed so you decompress all the records from record 1 to record n_document_records and n_document_records is the data you find in the PDB record. The Perl code doing the decompression is: Code:
my $header = $recs->[0]; if( defined _parse_headerrec($header) ) { # a proper Doc file should be fine, but if it's not Doc # compression like some Mobi docs seem to be we want to # bail early. Otherwise we end up with a huge stream of # substr() errors and we _still_ don't get any content. eval { sub min { return ($_[0]<$_[1]) ? $_[0] : $_[1] \ } my $maxi = min($#$recs, $header->{'records'}); for( my $i = 1; $i <= $maxi; $i ++ ) { $body .= _decompress_record( $h\ eader->{'version'}, $recs->[$i]->{'data'} )\ ; } }; return undef if $@; } # algorithm taken from makedoc7.cpp with reference to # http://patb.dyndns.org/Programming/PilotDoc.htm and # http://www.pyrite.org/doc_format.html sub _decompress_record($$) { my ($version,$in) = @_; return $in if $version == DOC_UNCOMPRESSED; my $out = ''; my $lin = length $in; my $i = 0; while( $i < $lin ) { my $ch = substr( $in, $i ++, 1 ); my $och = ord($ch); if( $och >= 1 and $och <= 8 ) { # copy this many bytes... basically a way to 'escape' d\ ata $out .= substr( $in, $i, $och ); $i += $och; } elsif( $och < 0x80 ) { # pass through 0, 9-0x7f $out .= $ch; } elsif( $och >= 0xc0 ) { # 0xc0-0xff are 'space' plus ASCII char $out .= ' '; $out .= chr($och ^ 0x80); } else { # 0x80-0xbf is sequence from already decompressed buffe\ r my $nch = substr( $in, $i ++, 1 ); $och = ($och << 8) + ord($nch); my $m = ($och & 0x3fff) >> 3; my $n = ($och & 0x7) + 3; # This isn't very perl-like, but a simple # substr($out,$lo-$m,$n) doesn't work. my $lo = length $out; for( my $j = 0; $j < $n; $j ++, $lo ++ ) { die "bad Doc compression" unless ($lo-$m) >= 0; $out .= substr( $out, $lo-$m, 1 ); } } } return $out; } |
|
![]() |
![]() |
![]() |
#140 |
Avid reader
![]() ![]() Posts: 262
Karma: 132
Join Date: Mar 2005
Location: The Netherlands
Device: HTC Touch Diamond, iLiad Book Edition
|
Thanks to you as well as a few palm documentation files from 2001 (and lots of use of a hex editor, ultraedit) I managed to make some prototype C# code that reads out all the raw HTML content. To share with you some information, if you like, it is actually list this:
Palm header Palm record index MOBI header (Kind of obvious) EXTH header (A dictionary format set of information about the book) Content (Compressed) Images (Uncompressed) FLIS header (license information) FCIS header (images information) I remain unsure on how to determine where the images start and how long they are. Nor do I know the 2 byte record between content & images nor the 4 byte one at the end. Maybe some sort of checksum. |
![]() |
![]() |
![]() |
#141 | |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 7,452
Karma: 7185064
Join Date: Oct 2007
Location: Linköpng, Sweden
Device: Kindle Voyage, Nexus 5, Kindle PW
|
Quote:
Are you sure that FLIS anc FCIS is license information? I think the record format contains information about how long the record is. I think that the first image record index is in MOBI+0x5B. When I decode a file I just check each record to see if it is an image. |
|
![]() |
![]() |
![]() |
#142 | |
Avid reader
![]() ![]() Posts: 262
Karma: 132
Join Date: Mar 2005
Location: The Netherlands
Device: HTC Touch Diamond, iLiad Book Edition
|
Quote:
Actually the length of the content can simply be read from the header after which the images start. There's a 2 byte header in between. As for the images, do you also assume each image is 2 records long? Or get the length from elsewhere? No, FCIS isn't about license information. It contains information about the images & the content size at lease. |
|
![]() |
![]() |
![]() |
#143 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 780
Karma: 1416
Join Date: Jan 2008
Device: Kobo Clara 2E/HD, Kindle PW
|
Code:
lit2mobi lembert_02_-_Stranglers_Moon.lit Unpack file lembert_02_-_Stranglers_Moon.lit in dir ctmp +---[ ConvertLIT (Version 1.8) ]---------------[ Copyright (c) 2002,2003 ]--- ConvertLIT comes with ABSOLUTELY NO WARRANTY; for details see the COPYING file or visit "http://www.gnu.org/license/gpl.html". This is free software, and you are welcome to redistribute it under certain conditions. See the GPL license for details. LIT INFORMATION......... DRM = 1 Timestamp = 6ac89519 Creator = 00000000 Language = 00000409 Writing out "d'Alembert_2_-_Stranglers_Moon" as "d'Alembert 2 - Stranglers Moon.htm" ... Successfully written to "ctmp/d'Alembert 2 - Stranglers Moon.htm". Writing out "RW_~Cover01" as "~Cover01.jpg" ... Successfully written to "ctmp/~Cover01.jpg". Writing out "RW_~Cover02" as "~Cover02.jpg" ... Successfully written to "ctmp/~Cover02.jpg". Writing out "RW_~Cover03" as "~Cover03.jpg" ... Successfully written to "ctmp/~Cover03.jpg". Writing out "RW_~Cover04" as "~Cover04.jpg" ... Successfully written to "ctmp/~Cover04.jpg". Writing out "RW_~Cover05" as "~Cover05.jpg" ... Successfully written to "ctmp/~Cover05.jpg". Exploded "lembert_02_-_Stranglers_Moon.lit" into "ctmp/". Read in HTML tree from opf Opf: Initialize from file: lembert_02_-_Stranglers_Moon.opf CONTENT: <?xml version="1.0" encoding="UTF-8" ?> <!DOCTYPE package PUBLIC "+//ISBN 0-9673008-1-9//DTD OEB 1.0.1 Package//EN" "http://openebook.org/dtds/oeb-1.0.1/oebpkg101.dtd"> <package unique-identifier="OverDriveGUID"> <metadata> <dc-metadata xmlns:dc="http://purl.org/dc/elements/1.0/" xmlns:oebpackage="http://openebook.org/namespaces/oeb-package/1.0/"> <dc:Title>d'Alembert 2 - Stranglers Moon</dc:Title> <dc:Identifier id="OverDriveGUID" scheme="GUID">{6ECBF068-47B6-49B7-838E-CB056DA516B7}</dc:Identifier> </dc-metadata> <x-metadata> <meta name="rwver-ReaderWorks-SDK-Control" content="2, 0, 2, 0215 (02/15/2002)" /> <meta name="rwver-HTML-Input-Filter" content="2, 0, 2, 0215 (02/15/2002)" /> <meta name="rwver-Image-Input-Filter" content="2, 0, 2, 0215 (02/15/2002)" /> <meta name="rwver-Text-Input-Filter" content="2, 0, 2, 0215 (02/15/2002)" /> <meta name="rwver-Word-Doc-Input-Filter" content="2.0.2.0215 (02/15/2002)" /> <meta name="rwver-LIT-file-generator" content="1.5.1.0280 (10/05/2000)" /> <meta name="rw-License-Key" content="RWPTL" /> </x-metadata> </metadata> <manifest> <item id="d'Alembert_2_-_Stranglers_Moon" href="d'Alembert 2 - Stranglers Moon.htm" media-type="text/html" /> <item id="RW_~Cover01" href="~Cover01.jpg" media-type="image/jpeg" /> <item id="RW_~Cover02" href="~Cover02.jpg" media-type="image/jpeg" /> <item id="RW_~Cover03" href="~Cover03.jpg" media-type="image/jpeg" /> <item id="RW_~Cover04" href="~Cover04.jpg" media-type="image/jpeg" /> <item id="RW_~Cover05" href="~Cover05.jpg" media-type="image/jpeg" /> </manifest> <spine> <itemref idref="d'Alembert_2_-_Stranglers_Moon" /> </spine> <guide> <reference type="other.ms-thumbimage-standard" href="~Cover01.jpg" /> <reference type="other.ms-coverimage-standard" href="~Cover02.jpg" /> <reference type="other.ms-titleimage-standard" href="~Cover03.jpg" /> <reference type="other.ms-thumbimage" href="~Cover04.jpg" /> <reference type="other.ms-coverimage" href="~Cover05.jpg" /> </guide> </package> OPF: TITLE: d'Alembert 2 - Stranglers Moon OPF: CREATOR: Init from manifest d'Alembert_2_-_Stranglers_Moon - d'Alembert 2 - Stranglers Moon.htm - text/html RW_~Cover01 - ~Cover01.jpg - image/jpeg Could not read image file: ~Cover01.jpg RW_~Cover02 - ~Cover02.jpg - image/jpeg Could not read image file: ~Cover02.jpg RW_~Cover03 - ~Cover03.jpg - image/jpeg Could not read image file: ~Cover03.jpg RW_~Cover04 - ~Cover04.jpg - image/jpeg Could not read image file: ~Cover04.jpg RW_~Cover05 - ~Cover05.jpg - image/jpeg Could not read image file: ~Cover05.jpg Warning, RW_~Cover01 missing from spine, adding Warning, RW_~Cover02 missing from spine, adding Warning, RW_~Cover03 missing from spine, adding Warning, RW_~Cover04 missing from spine, adding Warning, RW_~Cover05 missing from spine, adding Init from guide OPFTITLE: d'Alembert 2 - Stranglers Moon OPFAUTHOR: Coverimage: ~Cover02.jpg SPINE: adding d'Alembert_2_-_Stranglers_Moon - d'Alembert 2 - Stranglers Moon.htm - text/html Adding: d'Alembert 2 - Stranglers Moon.htm - d'Alembert_2_-_Stranglers_Moon +++.+SPINE: adding RW_~Cover01 - ~Cover01.jpg - image/jpeg SPINE: adding RW_~Cover02 - ~Cover02.jpg - image/jpeg SPINE: adding RW_~Cover03 - ~Cover03.jpg - image/jpeg SPINE: adding RW_~Cover04 - ~Cover04.jpg - image/jpeg SPINE: adding RW_~Cover05 - ~Cover05.jpg - image/jpeg All spine elements have been added Have Read in HTML tree from opf Saving mobi file (version 4): lembert_02_-_Stranglers_Moon.mobi COVEROFFSET: 0 THUMBOFFSET: 1 EXTH setting data: author - 100 - - 0x EXTH add: author - 100 - EXTH setting data: coveroffset - 201 - 0 - 0x30 EXTH add: coveroffset - 201 - 0 - 0x30 EXTH setting data: thumboffset - 202 - 1 - 0x31 EXTH add: thumboffset - 202 - 1 - 0x31 MOBIHDR: imgrecpointer: 112 EXTH setting data: author - 100 - - 0x EXTH add: author - 100 - EXTH setting data: coveroffset - 201 - 0 - 0x30 EXTH add: coveroffset - 201 - 0 - 0x30 EXTH setting data: thumboffset - 202 - 1 - 0x31 EXTH add: thumboffset - 202 - 1 - 0x31 New record for image 112: ~Cover02.jpg Reading data from file: ~Cover02.jpg [Image::BMP] ERROR: Not a bitmap: [~Cover02.jpg] at /usr/local/bin/MobiPerl/Util.pm line 486 Edit: This was with version .25 Last edited by JeffElkins; 01-13-2008 at 02:47 PM. Reason: add version |
![]() |
![]() |
![]() |
#144 | |||
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 7,452
Karma: 7185064
Join Date: Oct 2007
Location: Linköpng, Sweden
Device: Kindle Voyage, Nexus 5, Kindle PW
|
Quote:
Quote:
Quote:
|
|||
![]() |
![]() |
![]() |
#145 | |
Avid reader
![]() ![]() Posts: 262
Karma: 132
Join Date: Mar 2005
Location: The Netherlands
Device: HTC Touch Diamond, iLiad Book Edition
|
Quote:
I really need more mobipocket files before I can be sure about the FLIS because I actually only have a batch of DRM'less versions I test with. However it is a static sized record with remarkably much 0x0's in it and 0xFFFFFFFF's suggesting unused information in the record. And since all the other relevant information is elsewhere it seems very probable it is for the DRM. As mentioned, I ened a DRM'ed file to be sure. Last edited by Jaapjan; 01-13-2008 at 01:40 PM. |
|
![]() |
![]() |
![]() |
#146 | |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 7,452
Karma: 7185064
Join Date: Oct 2007
Location: Linköpng, Sweden
Device: Kindle Voyage, Nexus 5, Kindle PW
|
Quote:
Which record is cover image and which is thumb nail is given by data in EXTH. |
|
![]() |
![]() |
![]() |
#147 | |
Avid reader
![]() ![]() Posts: 262
Karma: 132
Join Date: Mar 2005
Location: The Netherlands
Device: HTC Touch Diamond, iLiad Book Edition
|
Quote:
But it is interesting that you say they're optional. That means there must be an indication somewhere if they're included or not. |
|
![]() |
![]() |
![]() |
#148 | |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 7,452
Karma: 7185064
Join Date: Oct 2007
Location: Linköpng, Sweden
Device: Kindle Voyage, Nexus 5, Kindle PW
|
Quote:
I would be more than happy to generate these record if I manage to find out the format of them and what they are for. |
|
![]() |
![]() |
![]() |
#149 |
Addict
![]() ![]() ![]() ![]() Posts: 223
Karma: 356
Join Date: Aug 2007
Device: Rocket; Hiebook; N700; Sony 505; Kindle DX ...
|
have you ever thought about implementing a convertion to the Sony LRF format in your suite? I'd really like to write it myself, but that format is not well documented, and/or relies on some Windows DLL...
Alessandro |
![]() |
![]() |
![]() |
#150 | |
Avid reader
![]() ![]() Posts: 262
Karma: 132
Join Date: Mar 2005
Location: The Netherlands
Device: HTC Touch Diamond, iLiad Book Edition
|
Quote:
Currently he is more often right then I am. But he might want to do some Sony code into his program. Though aren't there any LRF projects that convert to HTML? From HTML it is easy to get to Mobipocket. Speaking of which, What kind of testbatch of files do you use Tompe? Can you provide me with URL's to them? I am affraid you were right about the images too. Maybe anyway. I discovered that there are many more images in the file then I thought (also JPG's, not only GIF's). Back to the Mansion! ![]() |
|
![]() |
![]() |
![]() |
Tags |
mobi2mobi, mobils |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Mobi2Mobi Mobi2Mobi v0.13 - GUI for Mobiperl tools | Jad | Kindle Formats | 476 | 03-15-2015 05:51 PM |
Tools for Editing Kindle .mobi Files? | GJN | Kindle Formats | 33 | 12-26-2013 02:05 PM |
Handy Perl Script to convert HTML0 files to smartquotes | maggotb0y | Sony Reader | 0 | 04-12-2007 11:49 AM |
PRS-500 Perl tools to generate Reader content | TadW | Sony Reader Dev Corner | 0 | 01-08-2007 05:55 AM |
gmail copy (gmcp) - Perl script to copy files to/from Gmail | Colin Dunstan | Lounge | 0 | 09-04-2004 01:24 PM |