Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Formats > Kindle Formats

Notices

Reply
 
Thread Tools Search this Thread
Old 11-26-2007, 05:49 AM   #16
jbenny
Addict
jbenny has a complete set of Star Wars action figures.jbenny has a complete set of Star Wars action figures.jbenny has a complete set of Star Wars action figures.jbenny has a complete set of Star Wars action figures.
 
Posts: 323
Karma: 358
Join Date: May 2007
Device: Tablet PC and Nokia N800
Something that may be of help to you is the source to a program called "pdbshred". It can extract the HTML and image files from a Mobipocket and Peanut ebook. You can find the program and source (in C) to the program by Googling for "pdbshred source". I would post a direct link, but because of some additional functionality in the program, some here may not like a direct link.

A similar program is called "makedoc", but it doesn't extract the images.
jbenny is offline   Reply With Quote
Old 11-26-2007, 07:45 AM   #17
cstross
Cynic
cstross will become famous soon enoughcstross will become famous soon enoughcstross will become famous soon enoughcstross will become famous soon enoughcstross will become famous soon enoughcstross will become famous soon enough
 
Posts: 83
Karma: 514
Join Date: Jul 2007
Location: Edinburgh, Scotland
Device: iPhone 4, Kindle 3
Looks extremely cute ...

Are you planning on packaging this and sticking it on CPAN when it's stable?
cstross is offline   Reply With Quote
 
Enthusiast
Old 11-26-2007, 08:48 AM   #18
schmidt349
Member
schmidt349 is on a distinguished road
 
Posts: 20
Karma: 65
Join Date: Nov 2007
Device: Amazon Kindle
Well done Tompe! Looks like you beat me to the punch

Would you mind if I bolted an XSL backend onto your code, effectively making it "xml2html2mobi?" It's a mouthful, but would be quite useful.

God I love Perl.
schmidt349 is offline   Reply With Quote
Old 11-26-2007, 09:25 AM   #19
tompe
Grand Sorcerer
tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.
 
Posts: 7,029
Karma: 3973186
Join Date: Oct 2007
Location: Linköpng, Sweden
Device: Nexus 7, Nexus 5, iPad 2, Kindle PW
Quote:
Originally Posted by cstross View Post
Looks extremely cute ...

Are you planning on packaging this and sticking it on CPAN when it's stable?
Might be a good idea. I have never done it before but why not. If I can find out how to put a script on CPAN.

I will release the html2mobi script here in a day or two so I can get some feedback. I have to fix one serious bug and write some documentation.
tompe is offline   Reply With Quote
Old 11-26-2007, 09:32 AM   #20
tompe
Grand Sorcerer
tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.
 
Posts: 7,029
Karma: 3973186
Join Date: Oct 2007
Location: Linköpng, Sweden
Device: Nexus 7, Nexus 5, iPad 2, Kindle PW
Quote:
Originally Posted by schmidt349 View Post
Would you mind if I bolted an XSL backend onto your code, effectively making it "xml2html2mobi?" It's a mouthful, but would be quite useful.

God I love Perl.
Yes, it is a pleasure to program Perl :-)

I should probable write some packages to make it easier to do a xml2html2mobi. I wanted to have just one file to make it easier to use but maybe I should just split it up and submit it to CPAN. That can be the next step after it works and is tested more.

I used XML::Parser::Lite::Tree to parse the opf file but I am not sure this was a good idea. Do you know of any better library for opf files or for XML? I really liked HTML::Element and HTML::TreeBuilder so something similar for XML would be nice. Or a specific opf file library.
tompe is offline   Reply With Quote
Old 11-26-2007, 01:42 PM   #21
tompe
Grand Sorcerer
tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.
 
Posts: 7,029
Karma: 3973186
Join Date: Oct 2007
Location: Linköpng, Sweden
Device: Nexus 7, Nexus 5, iPad 2, Kindle PW
I have a problem. My converter generates a mobi file that is not entirely correct. It works perfect in FBreader. On my Gen3 it works but the number of pages is 650 when it should be arount 25. There are a lot of empty pages in the end. My Palm T5 refuses to load the file and says corrupt database 0x0209 (2).

What I wondered is if this is a problem with the Palmdoc things or if it is a problem with the html that i packed in the Palmdoc format?

I can have forgotten to set some parameter in the Palm::PDB package but I tested to load a working mobi file and than replacing the text and it did not work.

Ideas?
tompe is offline   Reply With Quote
Old 11-26-2007, 04:01 PM   #22
tompe
Grand Sorcerer
tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.
 
Posts: 7,029
Karma: 3973186
Join Date: Oct 2007
Location: Linköpng, Sweden
Device: Nexus 7, Nexus 5, iPad 2, Kindle PW
I realised that I had not written any Mobipocket header in record 0 at all and I was fooled by it working so well with FBReader. Were there any specification of the data that should be in record 0 anywhere? I have googled for it but can not find it.
tompe is offline   Reply With Quote
Old 11-26-2007, 05:07 PM   #23
igorsk
Wizard
igorsk reads XML... blindfoldedigorsk reads XML... blindfoldedigorsk reads XML... blindfoldedigorsk reads XML... blindfoldedigorsk reads XML... blindfoldedigorsk reads XML... blindfoldedigorsk reads XML... blindfoldedigorsk reads XML... blindfoldedigorsk reads XML... blindfoldedigorsk reads XML... blindfoldedigorsk reads XML... blindfolded
 
Posts: 3,443
Karma: 52235
Join Date: Sep 2006
Location: Belgium
Device: PRS-500/505/700, Kindle, Cybook Gen3, Words Gear
No spec. A few fields are documented in pdbshred but they're probably not what you need. I'm working on a more or less complete doc but here's what you should be able to get away with:

Quote:
0 DWord dwSignature //'MOBI'
4 DWord dwSize //including first two fields (put 0x18 here)
8 DWord dwType //pub type: 2=book,3=palmdoc,4=audio,news=257,feed=258,magazin e=259 etc
C DWord dwCodepage //1252=western, 65001 = UTF8. Better not use anything else
10 DWord dwUniqueId //? filled from rand() calls
14 DWord dwFileFormatVer //seems to correspond to Mobipocket reader ver. put 3 here
This is in addition to the palmdoc header, naturally.
igorsk is offline   Reply With Quote
Old 11-26-2007, 07:09 PM   #24
tompe
Grand Sorcerer
tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.
 
Posts: 7,029
Karma: 3973186
Join Date: Oct 2007
Location: Linköpng, Sweden
Device: Nexus 7, Nexus 5, iPad 2, Kindle PW
Quote:
Originally Posted by igorsk View Post
No spec. A few fields are documented in pdbshred but they're probably not what you need. I'm working on a more or less complete doc but here's what you should be able to get away with:



This is in addition to the palmdoc header, naturally.
Thanks. I have now managed to write record 0 so now I can add the MOBI header also.

When I unpacked a mobi file I saw three records after the last image and they have size 36, 52 and 4. What are these? One contained the string FLIS and one the string FCIS. Maybe the end of the document is not detected becasue I have not written these records.
tompe is offline   Reply With Quote
Old 11-26-2007, 07:14 PM   #25
tompe
Grand Sorcerer
tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.
 
Posts: 7,029
Karma: 3973186
Join Date: Oct 2007
Location: Linköpng, Sweden
Device: Nexus 7, Nexus 5, iPad 2, Kindle PW
How long must the MOBI header be?

At position 0xF4 I see the string EXTH and after that follows some strings that indicates that the author and titlte are stored there. Does this belong to the header?
tompe is offline   Reply With Quote
Old 11-26-2007, 08:24 PM   #26
igorsk
Wizard
igorsk reads XML... blindfoldedigorsk reads XML... blindfoldedigorsk reads XML... blindfoldedigorsk reads XML... blindfoldedigorsk reads XML... blindfoldedigorsk reads XML... blindfoldedigorsk reads XML... blindfoldedigorsk reads XML... blindfoldedigorsk reads XML... blindfoldedigorsk reads XML... blindfoldedigorsk reads XML... blindfolded
 
Posts: 3,443
Karma: 52235
Join Date: Sep 2006
Location: Belgium
Device: PRS-500/505/700, Kindle, Cybook Gen3, Words Gear
I beleive FCIS and FLIS have something to do with dictionary indices. Do you set the unpacked size and number of records in Palmdoc header correctly?
igorsk is offline   Reply With Quote
Old 11-26-2007, 08:36 PM   #27
tompe
Grand Sorcerer
tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.
 
Posts: 7,029
Karma: 3973186
Join Date: Oct 2007
Location: Linköpng, Sweden
Device: Nexus 7, Nexus 5, iPad 2, Kindle PW
Quote:
Originally Posted by igorsk View Post
I beleive FCIS and FLIS have something to do with dictionary indices. Do you set the unpacked size and number of records in Palmdoc header correctly?
The last record that was 4 byte contains E9 8E 0D 0A. I wonder if this is important...

The number of records are correct because I tried to include the image records in that number but then FBReader started to display garbage after the end of the text. I will double check the unpacked size. I have not set this pointer to first image either.

Now I have got the strange phenomen that the images in FBReader is correct but on my Gen3 they seem to be shifted. The "library" image seems to work. I just put it in the last record and it was displayed correctly on the Gen3. The change I did was that I set the record "id" to an increasing number for the text content instead of using 0.

Well, it moves forward. Hopefully I will fix the problem with the size and the image order soon so I have a first alpha version of the scripts.
tompe is offline   Reply With Quote
Old 11-26-2007, 08:53 PM   #28
igorsk
Wizard
igorsk reads XML... blindfoldedigorsk reads XML... blindfoldedigorsk reads XML... blindfoldedigorsk reads XML... blindfoldedigorsk reads XML... blindfoldedigorsk reads XML... blindfoldedigorsk reads XML... blindfoldedigorsk reads XML... blindfoldedigorsk reads XML... blindfoldedigorsk reads XML... blindfoldedigorsk reads XML... blindfolded
 
Posts: 3,443
Karma: 52235
Join Date: Sep 2006
Location: Belgium
Device: PRS-500/505/700, Kindle, Cybook Gen3, Words Gear
The "number of records" in palmdoc header (Word at 0x8) needs to be set to the number of records containing only text (no pictures). E.g. if you have compressed text in records 1,2 and 3, then set it to 3. The uncompressed size (dword at 4) has to be the full uncompressed size of all text.
By the way, I was wrong. Mobi format 3 needs MOBI header to be 0x74 bytes long, not 0x18. The fields are mostly irrelevant except for the number of the first record with images I mentioned above (at 0x5C).
There are also DATP records that contain mapping from uncompresed offset to record numbers but I didn't figure out their format yet and not sure if they're mandatory...
igorsk is offline   Reply With Quote
Old 11-26-2007, 09:23 PM   #29
tompe
Grand Sorcerer
tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.
 
Posts: 7,029
Karma: 3973186
Join Date: Oct 2007
Location: Linköpng, Sweden
Device: Nexus 7, Nexus 5, iPad 2, Kindle PW
Got it nearly to work on my Gen3 when I extende the MOBI header. The only problem is now that the title says "libc-2.3.6" and the header information is wrong...

Strangely enough the library image works without me including it. Maybe it takes the first record with an image and uses this.

# 4 DWord dwSize //including first two fields (put 0x18 here)

If I put 0x18 here it does not work. If I put 0xE4 here as in my example document then it works but the title did not work. So what does this number mean?

Last edited by tompe; 11-26-2007 at 09:25 PM.
tompe is offline   Reply With Quote
Old 11-26-2007, 09:57 PM   #30
tompe
Grand Sorcerer
tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.
 
Posts: 7,029
Karma: 3973186
Join Date: Oct 2007
Location: Linköpng, Sweden
Device: Nexus 7, Nexus 5, iPad 2, Kindle PW
Quote:
Originally Posted by tompe View Post
# 4 DWord dwSize //including first two fields (put 0x18 here)

If I put 0x18 here it does not work. If I put 0xE4 here as in my example document then it works but the title did not work. So what does this number mean?
I just realized what this field is. It is a pointer to the block that starts with EXTH. The first number after that sees to be the size of this block. But I have not managed to see how it is coded.

Maybe I should try to set this pointer to 0 and see if that means that this block does not exist.
tompe is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
html2mobi - html formatting brunovg Kindle Formats 2 12-13-2009 05:56 AM
Old Version Mobigen needed wilko10 Kindle Formats 11 11-25-2008 08:10 PM
Does someone still have Mobigen 6.01 build 37? IceHand Kindle Formats 7 03-03-2008 05:04 PM
lit2mobi written in Perl working tompe Bookeen 7 01-19-2008 01:06 PM
MobiPocket TOC using mobigen wallcraft Reading and Management 4 12-07-2007 09:45 AM


All times are GMT -4. The time now is 04:07 PM.


MobileRead.com is a privately owned, operated and funded community.