Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Formats > Kindle Formats

Notices

Reply
 
Thread Tools Search this Thread
Old 11-25-2007, 06:08 PM   #1
tompe
Grand Sorcerer
tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.
 
Posts: 7,452
Karma: 7185064
Join Date: Oct 2007
Location: Linköpng, Sweden
Device: Kindle Voyage, Nexus 5, Kindle PW
html2mobi (a mobigen replacement written in Perl)

When I realized that there was support for reading and writing mobi files in Perl I got inspired to start to write a mobigen replacement today since my favourite language is Perl.

Now if a set of html files are given to the script a table of content is generated automatically. The script also takes an opf file as input and now it manages to generate a working mobi file for the Alice in Wonderland test example with working images (at least they work in FBReader). The table of content is not working properly but I will look at that. Does anybody know the datastructure for this? I can always just add it in the beginning but if it is possible to do it correctly I will do it.

Now I just save the images in new records. Will this work? I seem to remember some limitations mentioned about the size of a record.

In a couple of days I can make the first alpha version available. But first I want to test the script on some more examples. So does anybody have any recomendation for files to test with? Or know about some well known issues I should check for?
tompe is offline   Reply With Quote
Old 11-25-2007, 06:26 PM   #2
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 73,645
Karma: 127838196
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
How ill your script handle images? Will they be the same size in the script generated mobi book?
JSWolf is offline   Reply With Quote
Advert
Old 11-25-2007, 06:28 PM   #3
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,775
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Are you actually parsing the HTML and recreating it or just packaging it into a mobi?
kovidgoyal is offline   Reply With Quote
Old 11-25-2007, 06:49 PM   #4
tompe
Grand Sorcerer
tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.
 
Posts: 7,452
Karma: 7185064
Join Date: Oct 2007
Location: Linköpng, Sweden
Device: Kindle Voyage, Nexus 5, Kindle PW
Quote:
Originally Posted by kovidgoyal View Post
Are you actually parsing the HTML and recreating it or just packaging it into a mobi?
I am parsing the HTML and recreating it after some patching. With a lot of complete HTML files as input you have to do this to get just one HTM file. Also you need to change the img tag. And I suppose I need to patch bad HTML code. For exemple some old lit files seems to give bad HTML and wrong entities after using clit.

But I have not actually found a specification of allowed HTML code. I was going to take the appoach that what works on my Gen3 is allowed...
tompe is offline   Reply With Quote
Old 11-25-2007, 06:53 PM   #5
tompe
Grand Sorcerer
tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.
 
Posts: 7,452
Karma: 7185064
Join Date: Oct 2007
Location: Linköpng, Sweden
Device: Kindle Voyage, Nexus 5, Kindle PW
Quote:
Originally Posted by JSWolf View Post
How ill your script handle images? Will they be the same size in the script generated mobi book?
I have to check this. I suppose I will use the method with alternative files or just scale a file if it is to big. And of course I can scale up an image if only images with small resolutions are available. And I will convert to jpg if the record size becomes to big using another format.

Do you always want to maximize the image size according to the reading device? Or should you add some size specification in the img tag?
tompe is offline   Reply With Quote
Advert
Old 11-25-2007, 07:40 PM   #6
wallcraft
reader
wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.
 
wallcraft's Avatar
 
Posts: 6,975
Karma: 5183568
Join Date: Mar 2006
Location: Mississippi, USA
Device: Kindle 3, Kobo Glo HD
MobiPocket does have PRCGEN Documentation, which provides some information about the supported HTML.

You have probably already seen MobiPocket TOC using mobigen and Images in MobiPocket. In particular, a toc.html appears to be required for mobigen to create a TOC and it is inserted at the end of the .mobi file. An automatic TOC would be a useful addition, and yet another reason to prefer html2mobi over mobigen.

I have never seen the hisrc attribute (Image support and display) used for an image in an actual MOBI file, but it might be one way to add a larger image to a MOBI file while maintaining backward compatibility. It might be enough, though, to have a default image size, or have html2mobi honor width & height larger than the image by rescaling the image (note that the reader ignores width & height larger than the image).
wallcraft is offline   Reply With Quote
Old 11-25-2007, 07:51 PM   #7
wallcraft
reader
wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.
 
wallcraft's Avatar
 
Posts: 6,975
Karma: 5183568
Join Date: Mar 2006
Location: Mississippi, USA
Device: Kindle 3, Kobo Glo HD
A mobi2html that explodes MOBI to HTML would also be useful. It would obviously only work on DRM-free PRC and MOBI files. The easiest option would just be to extract the single HTML file and the images, with the images correctly referenced in the HTML. Better would be to extract the .opf file from the HTML preamble. Note that mobi2epub would then be a simple addition, or just use the existing oeb2epub.py in combination with mobi2html.
wallcraft is offline   Reply With Quote
Old 11-25-2007, 08:04 PM   #8
tompe
Grand Sorcerer
tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.
 
Posts: 7,452
Karma: 7185064
Join Date: Oct 2007
Location: Linköpng, Sweden
Device: Kindle Voyage, Nexus 5, Kindle PW
Quote:
Originally Posted by wallcraft View Post
A mobi2html that explodes MOBI to HTML would also be useful. It would obviously only work on DRM-free PRC and MOBI files. The easiest option would just be to extract the single HTML file and the images, with the images correctly referenced in the HTML. Better would be to extract the .opf file from the HTML preamble. Note that mobi2epub would then be a simple addition, or just use the existing oeb2epub.py in combination with mobi2html.
I wrote that first. There are some issues with the images since I do not know how to find out which record they start in. But I do not think that the opf is saved in the preamble. At least it did not seem to be there for the Alice files.
tompe is offline   Reply With Quote
Old 11-25-2007, 08:15 PM   #9
tompe
Grand Sorcerer
tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.
 
Posts: 7,452
Karma: 7185064
Join Date: Oct 2007
Location: Linköpng, Sweden
Device: Kindle Voyage, Nexus 5, Kindle PW
Quote:
Originally Posted by wallcraft View Post
MobiPocket does have PRCGEN Documentation, which provides some information about the supported HTML.

You have probably already seen MobiPocket TOC using mobigen and Images in MobiPocket. In particular, a toc.html appears to be required for mobigen to create a TOC and it is inserted at the end of the .mobi file. An automatic TOC would be a useful addition, and yet another reason to prefer html2mobi over mobigen.

I have never seen the hisrc attribute (Image support and display) used for an image in an actual MOBI file, but it might be one way to add a larger image to a MOBI file while maintaining backward compatibility. It might be enough, though, to have a default image size, or have html2mobi honor width & height larger than the image by rescaling the image (note that the reader ignores width & height larger than the image).
I will look at the PRCGEN Documentation. I must have done womething wrong since on my Gen3 the Alice book became 600 pages long...

For the Alice in Wonderland opf the toc is inserted in the end because it is in the spine specification. And if it was not there its has to be inserted because it is in the manifest specification. What I do not get is how to code things so you get a button in FBReader for the toc. I assume the guide tag has something to do with this.

The gif cover that was 600x800 caused my Gen3 to hang so I had to reboot it. I rescaled it a bit and saved as jpg instead and that worked better.
tompe is offline   Reply With Quote
Old 11-25-2007, 08:37 PM   #10
wallcraft
reader
wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.
 
wallcraft's Avatar
 
Posts: 6,975
Karma: 5183568
Join Date: Mar 2006
Location: Mississippi, USA
Device: Kindle 3, Kobo Glo HD
An OPF preamble does seem to optional. This is from the MobiPocket version of Ring of Fire from the Baen Free Library.

Code:
<HTML><HEAD><metadata>
<dc-metadata xmlns:dc="http://purl.org/metadata/dublin_core" xmlns:oebpackage="http://openebook.org/namespaces/oeb-package/1.0/">
<dc:Title>Ring of Fire</dc:Title>
<dc:Type>Novel</dc:Type>
<dc:Identifier id="ISBN-074347175X" scheme="ISBN-Hardcover">0-7434-7175-X</dc:Identifier>
<dc:Identifier id="ISBN13-9780743471756" scheme="ISBN13-Hardcover">978-0-7434-7175-6</dc:Identifier>
<dc:Identifier id="ISBN-1416509089" scheme="ISBN-Paperback">1-4165-0908-9</dc:Identifier>
<dc:Identifier id="ISBN13-9781416509080" scheme="ISBN13-Paperback">978-1-4165-0908-0</dc:Identifier>
<dc:Identifier id="DOI-074347175X" scheme="DOI">10.1125/Baen.074347175X</dc:Identifier>
<dc:Publisher>Baen Books</dc:Publisher>
<dc:Creator role="aut" file-as="Flint, Eric">Eric Flint</dc:Creator>
<dc:Contributor role="art" file-as="Blair, Dru">Dru Blair</dc:Contributor>
<dc:Subject>Science Fiction</dc:Subject>
<dc:Rights>2004 by Eric Flint</dc:Rights>
<dc:Date>2004-01-01</dc:Date>
<dc:Language>US English (en-us)</dc:Language>
</dc-metadata>
</metadata>
<GUIDE>
<REFERENCE TYPE="toc" TITLE="Table of Contents" HREF="074347175X_top.htm"  filepos="0001692887">
<REFERENCE TYPE="cover" TITLE="Cover" HREF="074347175X__i_.htm"  filepos="0000001553">
<REFERENCE TYPE="copyright-page" TITLE="Copyright" HREF="074347175X__p_.htm"  filepos="0000001785">
<REFERENCE TYPE="firstpage" TITLE="First Page" HREF="074347175X__p_.htm#Chap_0"  filepos="0000004946">
</GUIDE>
<METADATA HREF="xyz_metadata.htm"  filepos="0001694500"><hr></HEAD><BODY>
<h1 align="center"><img src="BMP"            recindex="00001"><br />
Ring of Fire<br />
by<br />Eric Flint</H1>
<p align="center"><A HREF="074347175X_top.htm"  filepos="0001692887">Table of Contents</A></P>
If the e-book has anything more than the first page under the Reader's contents icon, then I think it has to have a non-empty <GUIDE> section.

Another way to generate "typical" MOBI books would be to run mobigen.exe on an exploded LIT file, and compare the result to using html2mobi. In the case of Baen books, you can use the LIT version and compare the result to their MOBI version.

Last edited by wallcraft; 11-25-2007 at 08:39 PM.
wallcraft is offline   Reply With Quote
Old 11-25-2007, 09:11 PM   #11
igorsk
Wizard
igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.
 
Posts: 3,442
Karma: 300001
Join Date: Sep 2006
Location: Belgium
Device: PRS-500/505/700, Kindle, Cybook Gen3, Words Gear
Quote:
Originally Posted by tompe View Post
I wrote that first. There are some issues with the images since I do not know how to find out which record they start in. But I do not think that the opf is saved in the preamble. At least it did not seem to be there for the Alice files.
Number of the first record with images is the dword at offset 0x5C in the 'MOBI' header (which starts at offset 0x10 in record 0).
I'm going to do a post on internals of mobi format "soon"...
igorsk is offline   Reply With Quote
Old 11-25-2007, 09:20 PM   #12
tompe
Grand Sorcerer
tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.
 
Posts: 7,452
Karma: 7185064
Join Date: Oct 2007
Location: Linköpng, Sweden
Device: Kindle Voyage, Nexus 5, Kindle PW
Quote:
Originally Posted by wallcraft View Post
If the e-book has anything more than the first page under the Reader's contents icon, then I think it has to have a non-empty <GUIDE> section.

Another way to generate "typical" MOBI books would be to run mobigen.exe on an exploded LIT file, and compare the result to using html2mobi. In the case of Baen books, you can use the LIT version and compare the result to their MOBI version.
I tried to use the guide tag according to the documentation but I did not get it to work. What I did not get was how the other point of the href should be specified. I tried with a name attribute to a but it did not seem to work.

I actually got mobi2html to work. Use it as:

perl mobi2html Alice_In_Wonderland.mobi > Alice.html

The images should work. But there are some problem with the rendering of the "wave text". I attach the script if anybody are interested in playing around with it. How do I attach a file called mobi2html?
Attached Files
File Type: pl mobi2html.pl (1.6 KB, 965 views)
tompe is offline   Reply With Quote
Old 11-25-2007, 09:31 PM   #13
tompe
Grand Sorcerer
tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.
 
Posts: 7,452
Karma: 7185064
Join Date: Oct 2007
Location: Linköpng, Sweden
Device: Kindle Voyage, Nexus 5, Kindle PW
Quote:
Originally Posted by igorsk View Post
Number of the first record with images is the dword at offset 0x5C in the 'MOBI' header (which starts at offset 0x10 in record 0).
I'm going to do a post on internals of mobi format "soon"...
Do you know how the "library" image is specified? I noticed that when I had 7 images in the document then the library image was the record directly after the 7:th image record.
tompe is offline   Reply With Quote
Old 11-26-2007, 03:24 AM   #14
andym
Groupie
andym has learned how to read e-booksandym has learned how to read e-booksandym has learned how to read e-booksandym has learned how to read e-booksandym has learned how to read e-booksandym has learned how to read e-booksandym has learned how to read e-books
 
Posts: 189
Karma: 793
Join Date: Oct 2006

Last edited by andym; 11-26-2007 at 03:26 AM.
andym is offline   Reply With Quote
Old 11-26-2007, 05:06 AM   #15
igorsk
Wizard
igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.
 
Posts: 3,442
Karma: 300001
Join Date: Sep 2006
Location: Belgium
Device: PRS-500/505/700, Kindle, Cybook Gen3, Words Gear
Quote:
Originally Posted by tompe View Post
Do you know how the "library" image is specified? I noticed that when I had 7 images in the document then the library image was the record directly after the 7:th image record.
Nope, didn't see how this one is stored.
I did notice that one of my books had a cover image that does not actually appear in the .mobi file... so it seems it's downloaded from the server and is stored separately in the Covers folder.
igorsk is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
html2mobi - html formatting brunovg Kindle Formats 2 12-13-2009 05:56 AM
Old Version Mobigen needed wilko10 Kindle Formats 11 11-25-2008 08:10 PM
Does someone still have Mobigen 6.01 build 37? IceHand Kindle Formats 7 03-03-2008 05:04 PM
lit2mobi written in Perl working tompe Bookeen 7 01-19-2008 01:06 PM
MobiPocket TOC using mobigen wallcraft Reading and Management 4 12-07-2007 09:45 AM


All times are GMT -4. The time now is 11:53 AM.


MobileRead.com is a privately owned, operated and funded community.