View Full Version : Mobiperl Mobiperl - Perl tools for handling MobiPocket files


Pages : [1] 2 3 4

tompe
12-20-2007, 02:53 PM
Mobiperl is a collection of tools I am writing in Perl for handling MobiPocket files. Since it works OK and I am interested in suggestions for more functionality I start a new thread for it.

The latest version is 0.0.43 and it can be found here: http://www.ida.liu.se/~tompe/mobiperl/downloads/ (updated July 2013)

*original text*

I have not yet tested mobiperl in Windows yet but if nobody else beat me to it I will try to test it during and write down how to use the programs under Windows. As I understand it ActivePerl is very common and it contains PPM that can be used to install packages.

Currently Mobiperl consists of:

MobiHeader.pm: A class file for generating MOBI headers.
html2mobi: Program to convert HTML files or OPF structures to MobiPocket.
mobi2html: Can be used to decode a MobiPocket file to an HTML file. Now mostly useful for debugging.
lit2mobi: Convert a lit file to a MobiPocker file.
mobi2mobi: Manipulate a Mobipocket file. Can for example be used to change the title.


I am also planning to write a mobi2opf program that explodes a MobiPocket file and tries to create an OPF structure.

tompe
12-20-2007, 02:59 PM
Since a lot of people have mentioned that it would be useful to be able to manipulate meta data and to add cover image to a prc or mobi file I wrote mobi2mobi. Here is the description of the program:

A program to manipulate MobiPocket files. Author and title can be set and a cover image (thumb nail image for Gen3) can be added.

There are two kind of prc files used for electronic books. One is a PalmDOC file which does not have a MOBI header but can contain HTML code marked up with MobiPocket specific markup and it can be read by a MobiPocket reader. For this format you cannot store meta information in the header. The other format is MobiPocket and it has a MOBI header and some additional data where you can store meta information and an extended title.

This program can change the extended title for a MobiPocket file. It can also automatically convert a PalmDOC file to a MobiPocket file and set the title.

It can also add author information to a PalmDOC file by converting it to a MobiPocket file and set the author meta information.

You can also add a prefix to a title in a MobiPocket file. This does not work for PalmDOC files.

It is possible to add an image to the file. If there are no other images in the file then the added image will be used as cover image and thumb nail image for Gen3.

Just running the program on a mobifile without any flags will print some information about the file.

Since there is no specification available for the MOBI header this program might generate books that are not entirely correct. So keep the original file...

diabloNL
12-20-2007, 03:44 PM
I would love to try it but I have no clue how to run it on Windows XP. I downloaded ActivePerl but don't know what to do next.

tompe
12-20-2007, 04:41 PM
I would love to try it but I have no clue how to run it on Windows XP. I downloaded ActivePerl but don't know what to do next.

I will test it myself in a couple of days and can give more information then.

For mobi2mobi you will need to install the Perl packages:

Palm::PDB
Palm::Doc (how do you write Palm:: Doc without space?)
Date::Format
Getopt::Mixed
Image::Size


There should be something called PPM to install packages.

To check the documentation do "perldoc mobi2mobi". To change a title do 'perl mobi2mobi --title "New Title" --outfile newfile.mobi originalfile.mobi". I assume you have to use a dosbox or cygwin or some shell with ActivePerl to do this.

But I am really not a Windows user so I might be mistaken.

delphidb96
12-20-2007, 04:52 PM
I will test it myself in a couple of days and can give more information then.

For mobi2mobi you will need to install the Perl packages:

Palm::PDB
Palm::Doc (how do you write Palm:: Doc without space?)
Date::Format
Getopt::Mixed
Image::Size



There should be something called PPM to install packages.

To check the documentation do "perldoc mobi2mobi". To change a title do 'perl mobi2mobi --title "New Title" --outfile newfile.mobi originalfile.mobi". I assume you have to use a dosbox or cygwin or some shell with ActivePerl to do this.

But I am really not a Windows user so I might be mistaken.

Or... You could create a standalone Windows executable that does the same thing. This would be of great benefit to the millions of WinXP users who can't get PERL to work right and don't own Linux or OSX machines. After all, we're, by far, the greatest number of mobi readers.

PLEASE!!!:2thumbsup

Derek

tompe
12-20-2007, 05:51 PM
If it is possible to create an executable than this might be an option. The disadvantage is that I have to boot to Windows to create a new version.

I have uploaded version 0.0.7. The bug I fixed was that the title change did not work for DRM:ed books. Now it works for my dictionary at least.

tompe
12-21-2007, 04:41 PM
This is what you need to do to run the Mobiperl program on Windows.

Install ActivePerl 5.8.8 (easily found by google). maybe 5.10 also works but I did not test that version.

Start the "Perl Package Manager" (from the program menu) and install packages. A package can also be installed from a cmd prompt with "ppm install packagename".

For mobi2mobi install the packages:

p5-Palm
TimeDate
Getopt-Mixed
Image-Size


For html2mobi you have to add the repository http://theoryx5.uwinnipeg.ca/ppms/package.lst to the package manager (in preferences). Then install the packages:

XML-Parser-Lite-Tree
GD


Documentation for mobi2mobi and html2mobi can be found here:

http://www.ida.liu.se/~tompe/mobiperl/mobi2mobi.html
http://www.ida.liu.se/~tompe/mobiperl/html2mobi.html

tompe
12-21-2007, 04:47 PM
For it to work on Windows with ActivePerl you need to get version 0.0.8 of Mobiperl.

I tested html2mobi on Windows with an HTML file and it worked. There was some problem with an opf file but I will look at that. I will also fix so that lit2mobi works properly on Windows.

It should be possible to package a Perl program in one Windows binary using PAR::Packer but I could not get it to work in Windows with ActivePerl (it did work in Linux). If I get this to work I will distribute the programs that way also. But the solution to the problem might be to wait for a better binary distribution of PAR::Packer and then it can take some time...

DaleDe
12-21-2007, 05:51 PM
This is what you need to do to run the Mobiperl program on Windows.

Install ActivePerl 5.8.8 (easily found by google). maybe 5.10 also works but I did not test that version.

Start the "Perl Package Manager" (from the program menu) and install packages. A package can also be installed from a cmd prompt with "ppm install packagename".

For mobi2mobi install the packages:

p5-Palm
TimeDate
Getopt-Mixed
Image-Size


For html2mobi you have to add the repository http://theoryx5.uwinnipeg.ca/ppms/package.lst to the package manager (in preferences). Then install the packages:

XML-Parser-Lite-Tree
GD


Documentation for mobi2mobi and html2mobi can be found here:

http://www.ida.liu.se/~tompe/mobiperl/mobi2mobi.html
http://www.ida.liu.se/~tompe/mobiperl/html2mobi.html


When I added uwinnipeg to my preference it found 0 pkgs.

Dale

tompe
12-21-2007, 06:34 PM
When I added uwinnipeg to my preference it found 0 pkgs.


The same happened for me and it took some time to realize I had put the URL where the name should be and the name where the URL should be. Maybe you have done the same thing?

tompe
12-21-2007, 06:37 PM
The uwinnipeg repository is need for the GD package. I also had to include the Palm::Doc module in the mobiperl distribution since the package containing this module had an old version that did not work for MOBI files.

delphidb96
12-21-2007, 09:42 PM
The uwinnipeg repository is need for the GD package. I also had to include the Palm::Doc module in the mobiperl distribution since the package containing this module had an old version that did not work for MOBI files.

I guess the one big problem I have with these tools is that I've got to put Yet Another Stupid Scripting Tool onto my system just to get it to run. And ActivePerl costs *money*! What?!? I've got to pay money to a *THIRD* party in order to run your apps? Hunh-unh! If I'm going to pay money, it should go to you. I done paid enough money when I paid HP and Microsoft for my computer system. Been There, Done That, Ain't Gonna Do It Again!

So, as nice as your mobitools appear to be, I won't be using them until they are turned into true executable apps.

Derek

delphidb96
12-21-2007, 09:44 PM
I guess the one big problem I have with these tools is that I've got to put Yet Another Stupid Scripting Tool onto my system just to get it to run. And ActivePerl costs *money*! What?!? I've got to pay money to a *THIRD* party in order to run your apps? Hunh-unh! If I'm going to pay money, it should go to you. I done paid enough money when I paid HP and Microsoft for my computer system. Been There, Done That, Ain't Gonna Do It Again!

So, as nice as your mobitools appear to be, I won't be using them until they are turned into true executable apps.

Derek

P.S.

If scripting tools are so wonderful, why do I need so many of them? Ruby, Python, Perl, Javascript, VBscript... Then there's the MS .Net - GAWD!

:angry:

tompe
12-22-2007, 05:24 AM
I guess the one big problem I have with these tools is that I've got to put Yet Another Stupid Scripting Tool onto my system just to get it to run. And ActivePerl costs *money*! What?!? I've got to pay money to a *THIRD* party in order to run your apps? Hunh-unh! If I'm going to pay money, it should go to you. I done paid enough money when I paid HP and Microsoft for my computer system. Been There, Done That, Ain't Gonna Do It Again!

So, as nice as your mobitools appear to be, I won't be using them until they are turned into true executable apps.


ActivePerl does not cost money. There are versions that cost money but the basic version is free. Unfortunately the webb site is a bit confusing and I understand why you did not find the free version. Here is a link

http://www.activestate.com/store/productdetail.aspx?prdGuid=81fbce82-6bd5-49bc-a915-08d58c2648ca

to a page where you can choose ActivePerl which is free.

But I totally undestand that having executables is much better and will try to fix that. But if you want to be able to do extensions yourself or help to find bugs you need the development environment.

On the other hand you install or have a lot of programs on your computer that works on non executable code so it should not be an issue. Why do you have a program to read books? Why is not each book an executable?

diabloNL
12-22-2007, 07:03 AM
I successfully changed the title of my book with mobi2mobi on my Windows XP. Thanks for the program Tommy! A small question though, when I imported it in MobiPocket Reader and then trying to send it to the Cybook, he asked me for the Login data of Fictionwise where I have bought the book. Could it be that mobi2mobi does something with the PID inside or sort?


EDIT: If the book already has a cover image, should it show the new cover image set by mobi2mobi? I tried it and it keeps showing the original.

tompe
12-22-2007, 10:53 AM
I successfully changed the title of my book with mobi2mobi on my Windows XP. Thanks for the program Tommy! A small question though, when I imported it in MobiPocket Reader and then trying to send it to the Cybook, he asked me for the Login data of Fictionwise where I have bought the book. Could it be that mobi2mobi does something with the PID inside or sort?


It should not do anything but the data concerning the DRM are manipulated so there might be some bug. But it worked just moving the file to the Cybook?


EDIT: If the book already has a cover image, should it show the new cover image set by mobi2mobi? I tried it and it keeps showing the original.

No, not currently but I can do it the way that it always replaces the cover image with what you specify. Maybe that is the logical way for it to work.

Thanks for the feedback.

diabloNL
12-22-2007, 11:43 AM
It should not do anything but the data concerning the DRM are manipulated so there might be some bug. But it worked just moving the file to the Cybook?



The way I solved it is to import the book in MobiPocket Reader. Then I try to open it and it comes with a pop-up with the login screen for Fictionwise. After I log in I can open the book.

Now I send the book to the Cybook and again I need to log in to activate it. After this I make a copy of the book on the Cybook and keep it as backup, because this file will work on the PC and the Cybook.

The same will happen if I download a purchased book and want to install it on a reader that has not been authorized for that book, also in this case it will come with the login screen for Fictionwise. So it seems that something in the file has changed causing that MobiPocket needs to check if it is authorized.

tompe
12-22-2007, 12:22 PM
If you just move the book to the Cybook using the mounted USB-disk does it work then?

I am fixing so that Meta data such as author can be changed and will test this with my dictionary and if this work I will do a new release.

diabloNL
12-22-2007, 12:39 PM
If you just move the book to the Cybook using the mounted USB-disk does it work then?

I am fixing so that Meta data such as author can be changed and will test this with my dictionary and if this work I will do a new release.


No, when I just copy it via USB-disk and try to open it it says: "File format not supported"

Let me know if you want me to test more. ;)

tompe
12-22-2007, 07:08 PM
No, when I just copy it via USB-disk and try to open it it says: "File format not supported"

Let me know if you want me to test more. ;)

I just copied version 0.0.9 to the usual place. I added a flag --coverimage that will replace existing cover images. I fixed so that you can add an author to a MobiPocket file (and to DRM:ed files). I tested all this on my DRM:ed dictionary and it worked to just copy the result to the Cybook.

If it does not work for your file could you send me the file so I can check that I have not made any assumptions that are to strong. I assumed for example that the version of the file was 5 for a DRM:ed file but maybe this is wrong. It is a bit limiting only having on DRM:ed file to test with :-)

heyjohn
12-22-2007, 07:30 PM
For any Windows users who just want a Linux platform to try scripts on, you can always burn a Linux Live CD. This is a CD that you boot off of, and it provides a full Linux desktop -- without touching your Windows setup unless you tell it to. You can use devices such as USB memory sticks for permanent storage.

Here is one mirror that provides ISO images of Fedora 8:
http://linux.nssl.noaa.gov/fedora/linux/releases/8/Live/i686/

The KDE version might be more comfortable for long-time Windows users. Be warned that it's a very full CD, nearly 700MB, and things can run a bit slow when running from CD, especially if you don't have much RAM. (I have tried the KDE LiveCD on a machine with 512MB, worked fine.) Perl, Ruby, and Python are all provided by default; however, not all Perl modules are provided -- so you'd wind up using the package manager to fix that. (su - ; yum install perl-Image-Size and so on.)

You can find more information about Fedora Linux here:
http://fedoraproject.org/

Admittedly, this isn't the sort of thing you want to do every day -- but if you want a taste of how Linux works without having to configure extra hardware or whatever, it might be worth playing with.

diabloNL
12-22-2007, 07:43 PM
Here (http://www.fictionwise.com/eBooks/eBook1764.htm?cache) you can download a book in secure Mobipocket for free. I don't mind sending you a copy of mine, but it will not work on your Cybook anyway because of the DRM. ;)

tompe
12-22-2007, 07:59 PM
Here (http://www.fictionwise.com/eBooks/eBook1764.htm?cache) you can download a book in secure Mobipocket for free. I don't mind sending you a copy of mine, but it will not work on your Cybook anyway because of the DRM. ;)

I will check it out. Can you run "mobi2mobi" on your book and see which version it says it has ("MOBIHEADER ver: 5" is what I get for my dictionary).

tompe
12-22-2007, 08:28 PM
I downloaded the book from Fictionwise and saw that the version was 4 and not 5. I have fixed that bug and made mobiperl-0.0.10.tar available.

I could add the author "Jules Verne" to the book (Around the world in 80 days) and the author shows up in my Cybook.

tompe
12-22-2007, 09:01 PM
I just realised that version 3 of MOBI header does not support meta information such as author name so the files generated from html or from a PalmDOC file will not have working author information. mobi2mobi will work with adding author information for MobiPocket files.

The annoying thing is that my code for generating version 4 seems to work on all devices except the Cybook. What I get is a message "Multi-part book not supported". So do anybody know what a multi-part book is and if I can find such books to download anywhere or if I can create them using MobiCreator? I have stared at the hex dumps but I really do not see any good candidates for where this information is encoded.

heyjohn
12-22-2007, 09:51 PM
I googled around and found a Nokia article that mentioned "Problematic XHTML-MP support in mobile browsers"... maybe that has something to do with it?

Is it only DRM'd files causing the problem, or all? If the latter, perhaps comparing mobigen's output could help. (I suspect you've already done that plenty of times, though.)

heyjohn
12-22-2007, 09:55 PM
Silly me, googling for "mobipocket multipart -mime" gives much more sane results. Nothing helpful yet, though.

ppxnouse
12-22-2007, 11:28 PM
Hello,

I hope it is OK to post this question in this thread, but it is somehow mobiperl related.

I am pretty new to eBooks and just ordered my Gen3 two days ago.
I tried Mobipocket Readers latest beta and found, that I am unable to add cover art to my prc files. So I stumbled over this thread. Thank you for the mobi2mobi script. Although a perl script might be a great way to have this running on a wide range of operating systems, I also want a convenient Win32 app to manage/change or even fetch my cover art.

That is why I started looking into the PRC format tonight. After 2 hours in front of my hex editor with some downloaded and some created prc files with and w/o cover art and coding a small parser, I still can not figure out how I (or the Mobipocket Reader :-)) can find out which PDB record actually contains the cover art. The attributes if the record header are always NULL. Sure I can look at the record data and see if it is a picture, but then, as you know, I get also the pictures in the book.

I suspect this to be an EXTH item. Can someone tell me what ID 201 and 300 contain ?


Do you have any hint what I am missing ?

Kiari
12-23-2007, 01:20 AM
Okay, that sounds absolutely amazing... I would love to have a tool like that...

*stares in confusion at the rest of the thread*

If only I had a bloody clue how to do it....

HarryT
12-23-2007, 04:06 AM
The annoying thing is that my code for generating version 4 seems to work on all devices except the Cybook. What I get is a message "Multi-part book not supported". So do anybody know what a multi-part book is and if I can find such books to download anywhere or if I can create them using MobiCreator? I have stared at the hex dumps but I really do not see any good candidates for where this information is encoded.

Hi Tommy,

The "MobiPocket Creator" tool creates v4 files which work just fine on the Gen3. I've just used it to create a new version of Dickens' "Bleak House" (available from the Mobi book forum). Might it be worth comparing the output from a simple book from "MobiPocket Creator" and your tool, and see if there are any differences?

diabloNL
12-23-2007, 04:45 AM
I will check it out. Can you run "mobi2mobi" on your book and see which version it says it has ("MOBIHEADER ver: 5" is what I get for my dictionary).


It is ver: 4.

I tried mobi2mobi 0.0.10, but when I try to open it with the MobiPocket desktop reader it says: "Please download the latest reader on www.mobipocket.com to read this book". On my Cybook it give the not supported message.

I tried with only changing the title and another time where I tried to change title and the coverimage(--coverimage). Both didn't work. :(

tompe
12-23-2007, 05:29 AM
That is why I started looking into the PRC format tonight. After 2 hours in front of my hex editor with some downloaded and some created prc files with and w/o cover art and coding a small parser, I still can not figure out how I (or the Mobipocket Reader :-)) can find out which PDB record actually contains the cover art. The attributes if the record header are always NULL. Sure I can look at the record data and see if it is a picture, but then, as you know, I get also the pictures in the book.


For the Cybook I think it is the first record after the HTML file records that is assumed to be a cover image. There is also a pointer to the first image record (the pointer is the record number or maybe the number of records used for header and the text in the book) for a MOBI header in 0x5c (4 bytes). What I did was to search for the first record with an image in it and assumed that this was the cover if it exsisted.

tompe
12-23-2007, 05:43 AM
I
I tried mobi2mobi 0.0.10, but when I try to open it with the MobiPocket desktop reader it says: "Please download the latest reader on www.mobipocket.com to read this book". On my Cybook it give the not supported message.

I tried with only changing the title and another time where I tried to change title and the coverimage(--coverimage). Both didn't work. :(

Strange. You could try just changing author and see if this work since that was what I explicitly tested. Since it worked for me with the Jules Verne book I cannot say anything more without seeing the header data in your file before and after.

diabloNL
12-23-2007, 06:10 AM
Strange. You could try just changing author and see if this work since that was what I explicitly tested. Since it worked for me with the Jules Verne book I cannot say anything more without seeing the header data in your file before and after.



I will send you the file before and after with the title changed. Can you PM your email to me?

ppxnouse
12-23-2007, 07:43 AM
-->What I did was to search for the first record with an image in it and assumed that this was the cover if it exsisted.

Hello tompe,

thank you for the hint. When I look at MobipocketCreator generated PRC file created from a HTML page containing pictures (took this thread as an example), the first records after the html part contain pictures used within the HTML. The cover art I specified in Creator can actually be found in the last record. Hmm,maybe I have to actually wait for the reader to see what it picks up. Hope it does not take too long till it is available again.

tompe
12-23-2007, 07:52 AM
I tried different things and I managed to get version 4 to work on the Cybook. So now version 4 is the default and the author information should be visible if it it there. The new release is 0.0.11.

So now I can take a PlamDOC copy of a book from Mobileread and add the author. On my Cybook my copy of Blue Hand now have the author Edgar Wallace shown under the title in the 5 books per page library view. I think I will fix all my books so that the author is shown since it looks so much nicer.

pruss
12-23-2007, 10:15 AM
For any Windows users who just want a Linux platform to try scripts on, you can always burn a Linux Live CD.

One can also install cygwin (www.cygwin.com) which will let one run most Linux-type stuff. I use cygwin's shell all the time for all kinds of things. I hardly ever use the Windows dos box any more.

tompe
12-23-2007, 10:34 AM
I did a version 0.0.12 where I fixed a bug with setting titles and authors that was shorter than the original ones.

I got an exemple MobiPocket version 4 file where the cover image was not the first one and I have not figured out how this image is pointed out so for these kind of files replacing the cover image will not work. In the MOBI header I think the data in B0-CF are related to this since they seem to contain pointers to records. B0 seem to point to the record with a small thumbnail image but changing just that image will not change the library image on my Cybook.

tompe
12-23-2007, 10:56 AM
The "MobiPocket Creator" tool creates v4 files which work just fine on the Gen3. I've just used it to create a new version of Dickens' "Bleak House" (available from the Mobi book forum). Might it be worth comparing the output from a simple book from "MobiPocket Creator" and your tool, and see if there are any differences?

I am looking at this book now. And the cover image was in the end for this book also. The book looked really fine on my Cybook and I really ought to read it since Bleak House has been on my to read list many years.

tompe
12-23-2007, 11:03 AM
Gen3. I've just used it to create a new version of Dickens' "Bleak House"

What language did you specify for Bleak House? Was it "en-uk"?

HarryT
12-23-2007, 12:33 PM
What language did you specify for Bleak House? Was it "en-uk"?

It was "en-gb". That's what "MobiPocket Creator" uses if you specify the language as "English (United Kingdom).

I've attached the opf file in case it's of any use to you.

tompe
12-23-2007, 01:21 PM
I managed to identify how the cover image is pointed out and version 0.0.13 can now replace the cover image in Bleak House.

The information was in the EXTH. Type 201 is the offset for the coverimage (from the first image record). And 202 is the offset for the thumbnail image but this image does not seem to be used on the Cybook.

ppxnouse
12-23-2007, 03:20 PM
Right, I just logged in to post the same about EXTH 201 :-).
So you were faster ;-).

BTW: Do you know what image types are allowed in a .prc file ?
For what images do you check ?
__________________________________________
Frank Huberty

ppxnouse
12-23-2007, 04:15 PM
An additional question:

I have a PRC generated by Mobi Reader 6.x beta (from aPDF). The prc does actually contain the cover art as the first image resource. So I patched 201, 202 and 203 in to the EXTH. Set its data to NULL, and expected the gif to show up as cover art in Mobireader. Well. That is not the case. I wonder if MobiReader expects the thmb to be a certain size max and I can not test on a Cybook :-(. Is there anything else that might need to get set ?

ppxnouse
12-23-2007, 04:46 PM
A too large image does not seem to be the problem. Even when I set 202 to an index that contains a small Image I do not get a thumb in MobiReader on my PC.

tompe
12-23-2007, 05:45 PM
BTW: Do you know what image types are allowed in a .prc file ? For what images do you check ?


I have assumed that GIF and JPEG is OK. In my html2mobi and lit2mobi programs I convert everything to JPEG. There is a maximum size for the image (64K I think).

tompe
12-23-2007, 05:50 PM
An additional question:

I have a PRC generated by Mobi Reader 6.x beta (from aPDF). The prc does actually contain the cover art as the first image resource. So I patched 201, 202 and 203 in to the EXTH. Set its data to NULL, and expected the gif to show up as cover art in Mobireader. Well. That is not the case. I wonder if MobiReader expects the thmb to be a certain size max and I can not test on a Cybook :-(. Is there anything else that might need to get set ?

On the Cybook only 201 seems to be used and if they do not exist that firstimage is automatically used as cover image. I do not now how the Mobireader works.

ppxnouse
12-23-2007, 06:13 PM
Great info, thank you for your help.

have assumed that GIF and JPEG is OK. In my html2mobi and lit2mobi programs I convert everything to JPEG. There is a maximum size for the image (64K I think).

I want to show the current artwork, the Cybook would show. So getting the image with offset n can only work if I identify all image resources. Well. For now I just assume there is only gif, png and jpeg, until proven wrong.

Now only my Cybook needs to arrive, so I can see if all works well.

tompe
12-23-2007, 06:43 PM
I think I managed to pack html2mobi and mobi2mobi into Windows binaries (using ActivePerl 5.8.8 build 820, PAR-Packer-588.ppd, -M flags to pp). They are available here (version 0.0.13):

http://www.ida.liu.se/~tompe/mobiperl/mobiperl.rar

html2mobi should work with html, opf and lit files. For lit files you need to have clit.exe in your path.

Documentation:

http://www.ida.liu.se/~tompe/mobiperl/mobi2mobi.html

http://www.ida.liu.se/~tompe/mobiperl/html2mobi.html

I did not test so much with lit files on Windows so there might be a problem with images but I assume that people who want to use the programs under Windows give me good bug reports...

All feedback and suggestions for enhancements are welcome.

heyjohn
12-24-2007, 09:25 AM
Can Mobiperl display and/or change the internal palmDB name of the ebook?

This is for use with oeb:redirect, as documented at http://www.mobipocket.com/dev/article.asp?BaseFolder=prcgen&File=URLRef.htm

I would imagine that creating an html file with clever use of oeb:redirect should result in the ability to create custom library orders -- books in a series, sorted by author or publication date, or whatever. (One nifty thought would be organizing the entire Baen Free Library on a Cybook... :-)

heyjohn
12-24-2007, 09:42 AM
Aha, the 'Name:' is actually what I'm looking for, it's the Palm::PDB database name, as opposed to the title or filename. Should be a simple matter to add an option to allow setting a different DB name there (probably wise to keep it to 31 characters plus a NULL).

And thanks for both the program, and for using a scripting language. Were it a binary, I wouldn't have been able to answer my own question... :rolleyes:

tompe
12-24-2007, 04:07 PM
Aha, the 'Name:' is actually what I'm looking for, it's the Palm::PDB database name, as opposed to the title or filename. Should be a simple matter to add an option to allow setting a different DB name there (probably wise to keep it to 31 characters plus a NULL).


I added a flag --databasename to change this name (available in 0.0.15 that can be downloaded now).

heyjohn
12-24-2007, 05:37 PM
Thanks!

As it turns out, I couldn't get oeb:redirect or oeb:library references to work in HTML on the Gen3 -- so it's not immediately useful for this device. It's possible that other devices (or trying it in .opf files) might have better luck. I'll probably revisit it when I have more time, as it would be very nice to construct customized multi-document guides.

HarryT
12-25-2007, 03:21 AM
Aha, the 'Name:' is actually what I'm looking for, it's the Palm::PDB database name, as opposed to the title or filename. Should be a simple matter to add an option to allow setting a different DB name there (probably wise to keep it to 31 characters plus a NULL).

And thanks for both the program, and for using a scripting language. Were it a binary, I wouldn't have been able to answer my own question... :rolleyes:

What's the reason for wanting to change the dbname, as a matter of interest?

DMcCunney
12-25-2007, 12:05 PM
P.S.
If scripting tools are so wonderful, why do I need so many of them? Ruby, Python, Perl, Javascript, VBscript... Then there's the MS .Net - GAWD!
:angry:There is no "one size fits all" scripting language, and there arguably can't be. Each language was developed by different people to address a different set of problems.

Perl was originally written by Larry Wall to perform sophisticated manipulations on text files in Unix. It combines the functionality of several existing tools, including awk, sed, and tr, and quickly became popular with system administrators and others that did a lot of text manipulation. Unix and things like it (Linux, *BSD) use lots of text files to specify the configuration, and utilities that can modify these files with a script rather than manually are a boon.

Javascript is also known as ECMAscript, was originally written by Brendan Eich for Netscape Navigator 3, and originally called LiveScript. Calling it Javascript was an unfortunate Netscape marketing decision which has caused no end of confusion ever since, because people don't realize that Java and Javascript are two completely different things whose only similarity is having Java in the name. Every browser implements JAvascript these days, so if you use a browser, you have it.

Python is a script language created by Guido van Rossum, intended for general programming tasks. While you can manipulate text with it, that's not all it does. One example of stuff written in Python which is relevant here is the parser for Plucker Desktop, that converts HTML files to the form used by Plucker on PDAs.

Ruby is a script language created by a Japanese programmer, designed to address what he saw as lacks in Python.

There are more: if you use Unix/Linux/BSD, there are the script languages implemented by the Bourne, Korn, C, and Bash shells. (The Korn shell language is an upward compatible superset of the bourne shell, the C shell implements a script language with a C like syntax, and the Bash shell attempts to implement the best features of both.) There is also John Ousterhout's TclTk, and from the IBM mainframe environment but ported to other platforms, Michael Colishaw's REXX language.

MS .NET isn't a script language at all -- it's a programming framework implemented as a set of libraries, ala Visual BASIC back when.

For that matter, PCs have had a scripting language since the MS-DOS days: COMMAND.COM implemented a rudimentary script facility used in "batch" files, which has been extended under CMD.EXE in recent Windows versions.

You don't need all of them -- only the ones required by things you do.
______
Dennis

delphidb96
12-25-2007, 01:16 PM
Dennis,

Fine. Just go ahead and let *EVERYONE* know you've had your funny bone removed. :) :) :)

(Me, I would have kept that factiod to myself.)

IOW, I was whining about the unfairness of life, not demanding a precise, thorough and detailed explanation of the (relative and existing solely in the minds of the developers) 'merits' of the various scripting languages and development frameworks.

There, I've stated my motivations clearly so you won't feel the need to expound further. :D

Derek

tompe
12-25-2007, 09:37 PM
I did a version 0.0.16 with a mobi2html program which explodes a DRM-free MobiPocket file into a subdirectory and the resulting HTML file has working links and working images.

For example:
mobi2html "Bleak House.prc" unpack

heyjohn
12-26-2007, 06:00 PM
What's the reason for wanting to change the dbname, as a matter of interest?

I got a Cybook for my mom for Christmas, and pre-loaded a bunch of Baen Free Library and Baen purchases on it. So I wanted to create a document with blurbs about the various books, and direct links to start reading any given book, using the Mobipocket oeb:redirect and/or oeb:library tags that use the database name of a book.

Unfortunately, I didn't have any luck. Either the Gen3's software doesn't support that feature, or I didn't use it correctly in the original HTML that I wrote that had the blurbs about the books. On the bright side, my mom (who's not a tech enthusiast) seems to really like the Gen3, and is getting some use out of the blurbs even without the direct links.

I'd love to be able to fix the direct links if there's any way to do it -- really nifty feature, and (if there are any Bookeen folks reading) one that can drive gift sales :-)

tompe
12-28-2007, 10:23 PM
I packed Windows binaries for version 0.0.17 which can be found here:

http://www.ida.liu.se/~tompe/mobiperl/mobiperl-0.0.17-win.rar

I have restructured the code a bit so there is now more binaries:


html2mobi - Convert HTML file to a MobiPocket file.
opf2mobi - Convert an opf file structure to a MobiPocket file.
lit2mobi - Convert a lit file to a MobiPocket file
mobi2html - Explode a DRM-free MobiPocket file.
mobi2mobi - Manipulate metadata for a MobiPocket file.


The documentation is here:


http://www.ida.liu.se/~tompe/mobiperl/html2mobi.html
http://www.ida.liu.se/~tompe/mobiperl/opf2mobi.html
http://www.ida.liu.se/~tompe/mobiperl/lit2mobi.html
http://www.ida.liu.se/~tompe/mobiperl/mobi2html.html
http://www.ida.liu.se/~tompe/mobiperl/mobi2mobi.html

wallcraft
12-29-2007, 01:14 PM
I used Twain, Mark: Joan of Arc. v1 25 Dec 2007 (http://www.mobileread.com/forums/showthread.php?t=18028) as a test case. Like all of the PRC files created with BookDesigner, it is a TEXtREAd document.

Under windows using 0.0.17, I was able to add a title and author using mobi2mobi.exe to create a MOBI version. I was also able to replace the small initial BMP image with a larger JPG version using --coverimage. However, the image was not registered as the cover image (e.g. Windows MobiPocket Reader does not have a cover page entry under the contents icon and does not show a thumbnail of the cover image in its eBooks library view).

On the other hand, mobi2html.exe created corrupted images, and running html2mobi.exe on the result of mobi2html crashed in "getBounds" (probably on one of the corrupted images). Also, mobi2html did not seem to support all the options in its documentation. For example, --coverimage with no argument did not work and neither did --mobifile MOBIFILE and --htmlfile HTMLFILE.

For Windows at least, it would be useful if mobi2html could add a filename extension to the images (e.g. record-7307489.bmp instead of just record-7307489). This requires detecting the image type, but even the wrong image type filename extension would probably be better than no extension.

For the particular case of BookDesigner .prc files, it would be a nice addition to mobi2mobi if the Table of Contents near the start could be used as the basis for a formal TOC. Then, mobi2mobi would become the easiest way to enhance all the existing .prc files in the Mobi/PRC Books Forum to fully functioning version 4 MOBI files.

I attach the corrupted image from mobi2html and a 3x magnified JPEG version of the correct BMP file (to use as a test cover). In many cases, the creators of the .prc version will have better large images to use as the cover than this.

Stephanos
12-29-2007, 02:43 PM
I have the same problem using mobi2html. The attached image is from a converted .prc (made using "proper" Mobi tools) uploaded by Harry here (http://www.mobileread.com/forums/showpost.php?p=86798&postcount=1). I added the .jpg extension.

tompe
12-29-2007, 03:18 PM
Did you check to see if the table of content worked for the exploded Joan of Arc? It did not work in Linux. It seems that the coded data contains Ctrl-M and these are note counted in the file position. I can easily solve it by removing the but i wondered if file ending convention is defined i a MobiPocket file?

I think i have fixed the image bug and I have tried to fix the cover image issues. I will just try to get the TOC to work properly and will do a new version since the corrupted image bug was so serious (forgot to do binmode on a filehandle...).

tompe
12-29-2007, 03:47 PM
However, the image was not registered as the cover image (e.g. Windows MobiPocket Reader does not have a cover page entry under the contents icon and does not show a thumbnail of the cover image in its eBooks library view).


I have tried to fix this and other issues in 0.0.18 (which is available as a tar file). I will later today test it under Windows and build new Windows binaries.


On the other hand, mobi2html.exe created corrupted images, and running html2mobi.exe on the result of mobi2html crashed in "getBounds" (probably on one of the corrupted images).


I think I have fixed this and will test it soon.


Also, mobi2html did not seem to support all the options in its documentation. For example, --coverimage with no argument did not work and neither did --mobifile MOBIFILE and --htmlfile HTMLFILE.


Did you really mean mobi2html? I checked html2mobi and these options seems to work.


For Windows at least, it would be useful if mobi2html could add a filename extension to the images (e.g. record-7307489.bmp instead of just record-7307489). This requires detecting the image type, but even the wrong image type filename extension would probably be better than no extension.


Fixed.


For the particular case of BookDesigner .prc files, it would be a nice addition to mobi2mobi if the Table of Contents near the start could be used as the basis for a formal TOC. Then, mobi2mobi would become the easiest way to enhance all the existing .prc files in the Mobi/PRC Books Forum to fully functioning version 4 MOBI files.


The idea was that mobi2mobi should only touch header data. I will think about this.

tompe
12-29-2007, 03:49 PM
Did you check to see if the table of content worked for the exploded Joan of Arc? It did not work in Linux. It seems that the coded data contains Ctrl-M and these are note counted in the file position. I can easily solve it by removing the but i wondered if file ending convention is defined i a MobiPocket file?

I was confused here. The real problem was that some files writes 'filepos=121313' and some writes 'filepos="33123321"'. And some files uses "<a" and some files ues "<A".

tompe
12-29-2007, 04:50 PM
New windows binaries:

http://www.ida.liu.se/~tompe/mobiperl/mobiperl-0.0.18-win.rar


mobi2mobi Twain_Joan\ of\ Arc.prc --outfile new.mobi --author "Mark Twain" --title "Joan of Arc" --coverimage cover.jpg --addthumbnail cover.jpg


works. The thumbnail will be seen in the MobiPocket reader.

wallcraft
12-30-2007, 11:21 AM
Thanks - version 0.0.18 works well.

I have started another thread on PRC to MOBI (http://www.mobileread.com/forums/showthread.php?t=18224), which uses mobi2mobi and includes some Windows command line advice.

wallcraft
12-30-2007, 11:49 AM
Did you really mean mobi2html? I checked html2mobi and these options seems to work. I did mean html2mobi. I think my problem was that I put file.html after the command arguments.

The idea was that mobi2mobi should only touch header data. That makes sense. How about in html2mobi instead?

For the particular case of BookDesigner .html files, it would be a nice addition to html2mobi if the Table of Contents near the start could be used as the basis for a formal TOC.

The HTML file could come directly from BD, or from mobi2html on a BD .prc file.

wallcraft
12-30-2007, 11:58 AM
On the other hand, mobi2html.exe created corrupted images, and running html2mobi.exe on the result of mobi2html crashed in "getBounds" (probably on one of the corrupted images).

I think I have fixed this and will test it soon. The corrupt image problem has been fixed, but the html2mobi problem is still there. I attach a screenshot of the command window with the error message.

tompe
12-30-2007, 01:19 PM
The corrupt image problem has been fixed, but the html2mobi problem is still there. I attach a screenshot of the command window with the error message.

Aha, I got that error in Linux also. I am looking at this now.

Making a real TOC and put it in the guide automatically from an html file using html2mobi should be relatively easy. I will put it on the list of things to fix.

tompe
12-30-2007, 01:43 PM
The problem is that the GD library seems not to be able to read BMP files? How common are BMP files in books? If you convert the files to jpg or gif or any other format tha GD supports html2mobi will work. But I saw that the links were not working properly. I will look at that also.

Thanks for testing and reporting the problems.

Jiiri
12-30-2007, 03:10 PM
I just wanted to say thanks to tompe for spending your time and talents on this program. I think there is a real need for this program - I am new to ebooks (Kindle owner) and immediately went to PG to get some books. Also, I found this site and have been enjoying the books that all the great uploaders put here.

The problem became obvious immediately: The metadata wasn't showing up for most .prc's that are freely available from public domain sources. On my Kindle's browser, books would show up as MILL_UTILITARIANISM, or just BLEAK HOUSE, etc. Nothing was showing up for most books under the author line.

My plan has been to buy a 4GB memory card and every book that I buy/download to just leave it on my Kindle, so I don't have to worry about moving books on and off the reader. When the metadata is inconsistent or just missing, it's going to make it extremely hard to find a book I need once a few hundred books get on there.

I haven't yet figured out how to use Mobiperl, but I'm going to work on it in the next few days. Thanks for the program - I just wanted to let you know that your work is appreciated, this tool is much needed and will help people make their books look correct on our readers.

Jiiri

Jiiri
12-30-2007, 03:28 PM
EDIT: Originally this thread was me whining about not being able to figure out how to explode a prc file. I got it now - I was making it WAY too hard. Thanks for all the tips in this thread; if I can figure this out, perhaps we could train a monkey or two to do our books for us.

Jiiri

ppxnouse
12-30-2007, 07:33 PM
Hi tombe,

I have a mobiperl request.
I posted it accidentally into the older thread:

http://www.mobileread.com/forums/showthread.php?t=16551&highlight=html2mobi&page=5

tompe
12-30-2007, 08:17 PM
Version 0.0.19 is available:

http://www.ida.liu.se/~tompe/mobiperl/mobiperl-0.0.19.tar

http://www.ida.liu.se/~tompe/mobiperl/mobiperl-0.0.19-win.rar


Changes in 0.0.19:


Fixed resizing of image data and rescaling of image for mobi2mobi. The rescaling is neccessary because of the Gen3 not working on files with images with width larger than 480 or something around that size. This seems to be a bug in the Gen3.

Added Image::BMP to Util.pm. Used to convert a file from BMP to GD if needed.

Fixed TOC bug caused by filepos attribute not removed in mobit2html.

Changed max file size for images to 60000. From MobiPocket forum: "- the size of each JPEG image is limited to about 63K for a number of technical reasons "

Fixed filepos bug in html2mobi so now TOC in the beginning works for Twain example.

Fixed mobi2html so it works for the Alice example were the filepos point to "<h" instead of pointing to a "<a".


It now seems to work to explode the Mark Twain file and then use html2mobi on the exploded html file.

Jiiri
12-30-2007, 08:25 PM
Just thought I'd pass this along for whatever it's worth - I've been playing with your (wonderful) mobiperl tools all day, getting my books to look right in the Kindle Home page. I had bought Bertrand Russell's "History of Western Philosophy" in Amazon's proprietary .azw format, and I guess whenever the publisher's did the conversion, they typed in 'Bertrand Russell' under author instead of 'Russell, Bertrand'.

The result of this was when I sorted by author, every author showed up correctly by last name except Russell, who showed up under 'B' for Bertrand instead of under 'R' for Russell. I'm anal, and it annoyed me greatly. At any rate, I took the .azw file off of my Kindle, changed the extension to .prc, and ran mobi2mobi to change the metadata. After the file spit out, I changed it back to .azw, and sure enough, it worked like a charm.

So, if anyone else has metadata they don't like in a protected .azw file, it works.

Enjoy, and thanks tompe!

Jiiri

wallcraft
12-30-2007, 08:45 PM
It now seems to work to explode the Mark Twain file and then use html2mobi on the exploded html file. This did not work with the Windows .exe versions (I did not try the Perl scripts). The html2mobi stage seem to work (e.g. the BMP images look ok), but the images in the final MOBI file are corrupted. They are mostly white. When I re-extract from the MOBI using mobi2html the JPG images are about ~4KB instead of ~55KB for the original BMP files.

tompe
12-30-2007, 10:02 PM
This did not work with the Windows .exe versions (I did not try the Perl scripts). The html2mobi stage seem to work (e.g. the BMP images look ok), but the images in the final MOBI file are corrupted. They are mostly white. When I re-extract from the MOBI using mobi2html the JPG images are about ~4KB instead of ~55KB for the original BMP files.

Actually, I tested it in Windows and it worked for me i think. Or maybe I forgot to check the BMP images... Does it work with a file without BMP images?

I suspect there is a binmode issue again. This is such an irritating difference between Unix and Windows.

I assume I have to boot to Windows again...

popophobia
12-30-2007, 10:38 PM
Hmm, I tried the windows program to convert a prc to mobi to change title name but it does not seem to work with Mobipocket reader from mobipocket.com. They claims it's corrupted.

When I try to open it with FBReader however, it seems ok but no "end of line" was carried over.

Also, I'm trying to run mobiperl in Linux, what package do I need and how do I get them? Thanks.

tompe
12-30-2007, 10:48 PM
Hmm, I tried the windows program to convert a prc to mobi to change title name but it does not seem to work with Mobipocket reader from mobipocket.com. They claims it's corrupted.


Could you tell me exactly what command you used and on what file? If you can give me the file even better.


When I try to open it with FBReader however, it seems ok but no "end of line" was carried over.


What do you mean? mobi2mobi does not touch the html-code so FBReader ought to show the original prc file in the same way.


Also, I'm trying to run mobiperl in Linux, what package do I need and how do I get them? Thanks.

Well, there probably is some method in your distribution how to install packages. In debian I usually search for missing files with "apt-file search filename" and the install the indicated package. Perl files ends with .pm.

If you do not find out which packages to install you can always do "perl -MCPAN -e shell" and the try to search for missing packages and install them. In principle if a file says "use Palm::Doc" you probably write "install Palm::Doc" in CPAN mode.

wallcraft
12-30-2007, 11:50 PM
Actually, I tested it in Windows and it worked for me i think. Or maybe I forgot to check the BMP images. Since the original BMP images were all <60KB, there was no need to convert them to JPG at all. In fact, if BMP or GIF images are small enough then leaving them "as is" is the best approach because a JPG always looses information. This is particularly true of line graphics. If an image is too big, then converting to JPG is probably the best approach in general.

tompe
12-31-2007, 06:47 AM
Since the original BMP images were all <60KB, there was no need to convert them to JPG at all. In fact, if BMP or GIF images are small enough then leaving them "as is" is the best approach because a JPG always looses information. This is particularly true of line graphics. If an image is too big, then converting to JPG is probably the best approach in general.

Yes, you are right. I did the conversation because the code became simpler. I will test this change and do a new release if it works.

HarryT
12-31-2007, 06:54 AM
Hi Tommy,

Might be worthwhile to see how the latest book I've uploaded, "Hard Times", works with your tools. This one gave 5 warnings about records >64k from Mobigen, but seems to work fine on the Gen3.

tompe
12-31-2007, 09:04 AM
Version 0.0.20 is available which will be the last version this year :)

http://www.ida.liu.se/~tompe/mobiperl/mobiperl-0.0.20-win.rar

http://www.ida.liu.se/~tompe/mobiperl/mobiperl-0.0.20.tar

Changes in 0.0.20:


<pre>-tags is now replaced with someting that displays better in a MobiPocket reader.

Changed max file size for images to 61000 to avoid conversion of a BMP image.

Images are now not converted to jpg if they fit the file size and image size requirements.


For some reason I do not understand the BMP reading do not work in Windows so if you have a BMP file greater that 61000 bytes or with a width greater than 480 then it will not work under Windows. It will work in Linux.

tompe
12-31-2007, 09:35 AM
Might be worthwhile to see how the latest book I've uploaded, "Hard Times", works with your tools. This one gave 5 warnings about records >64k from Mobigen, but seems to work fine on the Gen3.

I tried to download the book to my Palm T5 but it failed. If I regenerated the book using mobi2html and html2mobi it worked because I resize the image data. Strange that mobigen does not do that properly.

I will add resizing image functionality to mobi2mobi (a flag --fiximagesizes).

HarryT
12-31-2007, 12:19 PM
I tried to download the book to my Palm T5 but it failed. If I regenerated the book using mobi2html and html2mobi it worked because I resize the image data. Strange that mobigen does not do that properly.


Perhaps Mobi feel that these days the number of people running on Palm architecture is rather small, and that it's OK to allow better quality pictures on devices which do support it. It is, I suspect, the 64k segmented memory architecture of the Palm which is the issue. That restriction doesn't apply to any other platform - Gen3, Windows, Pocket PC, etc; all those devices have "flat" memory.

tompe
12-31-2007, 06:29 PM
Perhaps Mobi feel that these days the number of people running on Palm architecture is rather small, and that it's OK to allow better quality pictures on devices which do support it. It is, I suspect, the 64k segmented memory architecture of the Palm which is the issue. That restriction doesn't apply to any other platform - Gen3, Windows, Pocket PC, etc; all those devices have "flat" memory.

Maybe, but then I think they should have defined a new format. And there web page says that images should have a certain size. i think it is just a restriction in mobigen and if you want to have files that are real MobiPocket files you should fix the problem.

HarryT
01-01-2008, 07:25 AM
Trouble is, there's no easy way to find out which specific pictures are the problem. When you have 40+ pictures in a book it would be a nightmare to reduce the size of each one in turn and see if the warning goes away.

It's NOT the size of the original source image that's the problem - in all my Dickens books which have pictures, the "original" of the picture JPEG is typically 100-130kb in size. In the vast majority of cases MobiGen obviously does something to the pictures to make them work OK, but it a small minority of cases it gives you a "record >64k" warning, and tells you that it won't work on a Palm device as a consequence.

tompe
01-01-2008, 09:17 AM
In the next version of mobi2mobi it will print out warnings for the images. For your Dicken's book it looks like:


FIRST IMG Record 168
WARNING: Record 174 - Image data size might be to large: 63708
ERROR: Record 176 - Image data size definitely to large: 71488
WARNING: Record 177 - Image data size might be to large: 65324
ERROR: Record 178 - Image data size definitely to large: 72376
WARNING: Record 179 - Image data size might be to large: 64440
ERROR: Record 183 - Image data size definitely to large: 72632
ERROR: Record 184 - Image data size definitely to large: 68676
WARNING: Record 188 - Image data size might be to large: 63136


And you can now use mobi2html to unpack your file. You will ge a directory with images and you can just inspect the file size.

HarryT
01-01-2008, 09:46 AM
That's very useful, thanks. Would it be possible to add an "autofix" option to your "mobi2mobi" tool to fix the problem automatically? That would be an easy solution for people who do want to read on Palm devices. Other Mobi devices don't seem to have the 64k restriction.

Looks like a bug in MobiGen - it's obviously resizing the images but not getting the resize quite right in these particular cases.

tompe
01-01-2008, 10:00 AM
That's very useful, thanks. Would it be possible to add an "autofix" option to your "mobi2mobi" tool to fix the problem automatically? That would be an easy solution for people who do want to read on Palm devices. Other Mobi devices don't seem to have the 64k restriction.

Looks like a bug in MobiGen - it's obviously resizing the images but not getting the resize quite right in these particular cases.

Yes, I have already done it and it will be in the next version (a flag --fiximagesizes). It will just reduce the quality until it succeeds or run out of lower qualities to test. I do not resize the image.

HarryT
01-01-2008, 10:04 AM
That sounds perfect!

TallMomof2
01-01-2008, 10:39 AM
Just thought I'd pass this along for whatever it's worth - I've been playing with your (wonderful) mobiperl tools all day, getting my books to look right in the Kindle Home page. I had bought Bertrand Russell's "History of Western Philosophy" in Amazon's proprietary .azw format, and I guess whenever the publisher's did the conversion, they typed in 'Bertrand Russell' under author instead of 'Russell, Bertrand'.

The result of this was when I sorted by author, every author showed up correctly by last name except Russell, who showed up under 'B' for Bertrand instead of under 'R' for Russell. I'm anal, and it annoyed me greatly. At any rate, I took the .azw file off of my Kindle, changed the extension to .prc, and ran mobi2mobi to change the metadata. After the file spit out, I changed it back to .azw, and sure enough, it worked like a charm.

So, if anyone else has metadata they don't like in a protected .azw file, it works.

Enjoy, and thanks tompe!

Jiiri

Thanks, Jiiri!

It worked great for me too on a file I'd used Igorsk's Kindle hack to add the Kindle PID. The only problem was that for 95% of the files I'd converted the title info was completely lost not to mention the author. Too bad I have a gazillion files to run through Mobi2Mobi.

tompe
01-01-2008, 11:04 AM
When I do something that I might have to repeat because there is a better version of a program available (like som bug fixed in mobi2mobi :) ) I usually put the commands in a make file so I can repeat them.

In Windows that seems to mean that you install nmake and then you create a file Makefile that can contain just:


all:
<TAB> mobi2mobi ...
<TAB> mobi2mobi ....


and then just run nmake in the directory where you put the Makefile.

Jut wanted to mention this since I worry a bit that my programs contain some bug so people have to repeat the application of them. Also save the original file!

HarryT
01-01-2008, 11:05 AM
What's the benefit of this over simply using a batch file?

tompe
01-01-2008, 11:14 AM
What's the benefit of this over simply using a batch file?

Ah, maybe none :) I use make files since I know the syntax for them. Also it is a standardized way to do things so the file will work on different platforms and the file always has the same name. But a batch file will work as well on Windows I suppose.

DaleDe
01-01-2008, 02:08 PM
In the next version of mobi2mobi it will print out warnings for the images. For your Dicken's book it looks like:


FIRST IMG Record 168
WARNING: Record 174 - Image data size might be to large: 63708
ERROR: Record 176 - Image data size definitely to large: 71488
WARNING: Record 177 - Image data size might be to large: 65324
ERROR: Record 178 - Image data size definitely to large: 72376
WARNING: Record 179 - Image data size might be to large: 64440
ERROR: Record 183 - Image data size definitely to large: 72632
ERROR: Record 184 - Image data size definitely to large: 68676
WARNING: Record 188 - Image data size might be to large: 63136


And you can now use mobi2html to unpack your file. You will ge a directory with images and you can just inspect the file size.

It would be better if you said: Image data size might be too large

adelheid
01-04-2008, 04:43 AM
Could anyone point me to an instruction to get this to work on a Mac? I have downloaded the tar file, the necessary perl modules for mobi2html (which is the program I would like to use), but on the cpan site I am told that I need to make sure that the line endings are suited for the Mac (in all files, which seems a lot of work), that I need to compile some modules, but some perhaps not.

I am a bit confused about what I have to do exactly. Any help would be much appreciated.

tompe
01-04-2008, 07:36 AM
Could anyone point me to an instruction to get this to work on a Mac? I have downloaded the tar file, the necessary perl modules for mobi2html (which is the program I would like to use), but on the cpan site I am told that I need to make sure that the line endings are suited for the Mac (in all files, which seems a lot of work), that I need to compile some modules, but some perhaps not.

I am a bit confused about what I have to do exactly. Any help would be much appreciated.

Which modules did you download? Which Perl are you using?

On a Mac I would try to install modules with "perl -MCPAN -e shell" which will give you a shell and "help" or "?" works there.

If you google on: installing perl modules using CPAN tutorial

you will get some tutorials and explanations.

Later today when I have some time I will try to list all "install" commands that should be enough on a Unix system running an ordinary Perl.

adelheid
01-04-2008, 08:32 AM
The point is, that I have to go through all the files to change the line endings and cpan is not clear about how I can determine if a module is compiled or not.

tompe
01-04-2008, 09:48 AM
The point is, that I have to go through all the files to change the line endings and cpan is not clear about how I can determine if a module is compiled or not.

Could you point me to where you read about the line endings?

I would test to just write "perl -MCPAN -e shell" and then do:

install Palm::PDB
install Palm::Doc
install XML::Parser::Lite::Tree
install GD
install Image::BMP
install Image::Size
install HTML::TreeBuilder
install Getopt::Mixed
install Date::Parse
install Date::Format

and see if it works. I think this is the complete list of modules that you need to install.

I am writing a web page to document how to install on different platforms but I do not have a Mac available so I cannot test on that platform.

adelheid
01-04-2008, 10:40 AM
OK, I will try it it on the Mac and let you know how it works out.

adelheid
01-04-2008, 12:01 PM
Installing Palm::PDB following you instruction gives:

"make had returned bad status, install seems impossible"

Info about install: http://www.cpan.org/modules/INSTALL.html

"C. BUILD
Does the module require compilation?

1. If it does,"

No info on how to determine if it does :-(

"D. INSTALL
Make sure the newlines for the modules are in Mac format, not Unix format. Move the files manually into the correct folders."

I don't know what "the correct folders" are :-(

So, a lot of questions on my part.

edit: I don't seem to have the make program on my machine. So I probably need to install that. Do people on Windows also have to go trough installing dependencies like this? Otherwise I had perhaps better fire up my Windows virtual machine and try there.

tompe
01-04-2008, 12:24 PM
What version of the operating system are you using and what version of Perl are you using?

You need to install make and have a compiler. I am surprised you did not have them but I do not know anything about Macs so maybe you have to install some developers package?

There are Windows binaries but it is better to use the non-binary version since then you can get updates faster. The release of the binary version depends on my willingness to boot my laptop to Windows...

tompe
01-04-2008, 12:30 PM
"D. INSTALL
Make sure the newlines for the modules are in Mac format, not Unix format. Move the files manually into the correct folders."


I read this now and is a bit surprised that this is the status of Perl on a Mac. I will check around a bit. Maybe ActiveStates Perl version is a better alterative for Mac if they have pre-built binaries for modules as the Windows version had.

kovidgoyal
01-04-2008, 01:52 PM
@tompe: From my experience with libprs500, OSX is a royal pain in the ass. It's far and away the worst of the three to support. By default OS X has no compiler installed, your users would have to download Xcode. If you're serious about supporting OSX, I would recommend looking at some solution that allows you to distribute an embedded Perl interpreter.

tompe
01-04-2008, 03:00 PM
@tompe: From my experience with libprs500, OSX is a royal pain in the ass. It's far and away the worst of the three to support. By default OS X has no compiler installed, your users would have to download Xcode. If you're serious about supporting OSX, I would recommend looking at some solution that allows you to distribute an embedded Perl interpreter.

Aha. The embedded Perl interpreter approach is the one used for the Windows binaries. I will not do OSX binaries since I do not have an OSX machine easily available but if any OSX user would like to contribute binaries they are welcome to do that.

tompe
01-04-2008, 09:53 PM
I put a web page at http://www.ida.liu.se/~tompe/mobiperl/ describing MobiPerl and I tried to document how to install Perl and required modules.

There is also a version 0.0.21 available (no Windows binaries for this version).

Changes in 0.0.21:


Added detection of images that are too large in mobi2mobi

Added flag --fiximagesizes to mobi2mobi so it will fix the incorrect image sizes.

--htmlfile flag for lit2mobi placed file in wrong dir, fixed now.

Added flag --coveroffset to mobi2mobi and fixed a bug relating to this. Now it is possible to change which image that is cover image by just specifying another offset.

The distribution is now a packed directory.

Added flags --exthtype and --exthdata to mobi2mobi to be able to set any type. Values can be set for unknown types 204, 205, 206, 207, 401, 403. This can help to figure out what these items mean.

JeffElkins
01-06-2008, 03:46 PM
How exactly does this work?

From the console:

lit2mobi --addcoverlink myfile.lit

I get no cover image in the resulting .mobi file. Did I miss something?

tompe
01-06-2008, 06:45 PM
How exactly does this work?

From the console:

lit2mobi --addcoverlink myfile.lit

I get no cover image in the resulting .mobi file. Did I miss something?

--addcoverlink will add a cover image in the html file but you have to have a cover image that is found in the lit file. If your lit file is missing a coverimage then you can add one by

lit2mobi --coverimage cover.jpg myfile.lit

If the thumbnail then do not work you can use the --addthumbnail flag to mobi2mobi. But for Gen3 it should be enough to add the coverimage. I will test this functionality a bit more and fix any problems I find to the next release.

tompe
01-06-2008, 06:56 PM
Found a bug. --coverimage will not work for lit2mobi. But you can use mobi2mobi to add a cover image. Will fix this to the next release.

JeffElkins
01-06-2008, 07:10 PM
Found a bug. --coverimage will not work for lit2mobi. But you can use mobi2mobi to add a cover image. Will fix this to the next release.

Thanks! I'm new to the ebook world and so far your apps have been extremely helpful.

tompe
01-06-2008, 10:42 PM
Version 0.0.22 with Windows binaries are now available via:

http://www.ida.liu.se/~tompe/mobiperl/

Changes in 0.0.22:

Fixed so that flags --coverimage, --addthumbnail, --addcoverlink works for html2mobi and lit2mobi.


lit2mobi will try to figure out if there is a cover image available. If none is found or if you want to override the choice use the --coverimage flag. If a cover image is found in some way it will be used as cover image and as thumb nail image (used in MobiPocket desktop version for example). --addcoverlink will add the coverimage first in the document so it is visible in for example FBReader. In Gen3 you will get double cover images with this flag.

Jiiri
01-07-2008, 08:00 AM
Tompe, quick question. I purchased a Harvard Classic ebook from Amazon (before I found them on this site, unfortunately), and the title is not what I'd like it to be. With other .azw files I've been able to change the extension to prc, run it through the mobi2mobi script, and then rename the .mobi file that results with .azw, and everything has been working.

The problem I'm having now is that I can't seem to change the title on this file, it still shows up in my Kindle wrong. In the Command Prompt it is showing the title as correct, and the author as correct. The name that I don't want but that is showing is listed in mobi2mobi under

EXTH item: 503-503-80- Harvard Classics, Vol. 23: Two years before the mast and twenty-four years after

and;

LONGTITLE: Harvard Classics, Vol. 23: Two years before the mast and twenty-four years after

Do you have any idea which of these strings is the one I need to change? Also, does mobi2mobi allow me to change it? Thanks!

Jiiri

tompe
01-07-2008, 10:10 AM
The problem I'm having now is that I can't seem to change the title on this file, it still shows up in my Kindle wrong. In the Command Prompt it is showing the title as correct, and the author as correct. The name that I don't want but that is showing is listed in mobi2mobi under

EXTH item: 503-503-80- Harvard Classics, Vol. 23: Two years before the mast and twenty-four years after

and;

LONGTITLE: Harvard Classics, Vol. 23: Two years before the mast and twenty-four years after


The LONGTITLE can be changed by mobi2mobi using --title. You can use mobi2mobi without any flag to check the result. It this does not help it must be the 503 that is the problem. With the latest mobi2mobi you might be able to change it with:

mobi2mobi --exthtype 503 --exthdata "New title" --outfile new.awz in.awz

So I wonder what the intended meaning of item 503 is?

Jiiri
01-07-2008, 10:48 AM
I downloaded the latest version, and tried the string you suggested, with no luck. It's impossible for me to troubleshoot it since I have no idea what the strings even mean or do, not being a programmer. If the problem is the drm of the .azw file, I'd just as soon not spend a lot of time just to find out that I can't do it anyway. If you care at all, I can send you the azw file and let you look at it.

tompe
01-07-2008, 03:32 PM
I got a testfile for this problem and I think I have fixed the problem. The problem occured if item 401 was in the EXTH header and had length 1. I had mistakenly assumed it should have length 4. Things that happens when the format is not documented.

it is probably good to make these assumption. They willl cause bugs but I will get feedback and more data to use to try to figure out what is wrong. So whenever you see something like:


ERROR: generated EXTH does not match original
MISMATCH POS:878:0x9:0xc
MISMATCH POS:883:0x93:0x0
MISMATCH POS:885:0x0:0x1
MISMATCH POS:886:0x0:0x93
MISMATCH POS:887:0x9:0x0
MISMATCH POS:890:0x0:0x9


in the output from mobi2mobi please tell me and if possible send me the file that caused this.

There is a version 0.0.23 available as tar file but the only change is the one described here so no need to update if you do not have this problem.

JeffElkins
01-07-2008, 04:13 PM
Treasure_Box - Treasure Box.htm - text/html
RW_~Cover01 - ~Cover01.jpg - image/jpeg
Could not read image file: ~Cover01.jpg
Can't call method "getBounds" on an undefined value at /usr/local/bin/MobiPerl/Util.pm line 25, <OPF> chunk 1.

0.22 fixed my image problem, thanks! I do get the occasional failure as posted above. Not a showstopper for me at all, but thought I'd report it. The ~Cover01.jpg file seems to be fine.

tompe
01-07-2008, 04:32 PM
Treasure_Box - Treasure Box.htm - text/html
RW_~Cover01 - ~Cover01.jpg - image/jpeg
Could not read image file: ~Cover01.jpg
Can't call method "getBounds" on an undefined value at /usr/local/bin/MobiPerl/Util.pm line 25, <OPF> chunk 1.

0.22 fixed my image problem, thanks! I do get the occasional failure as posted above. Not a showstopper for me at all, but thought I'd report it. The ~Cover01.jpg file seems to be fine.

It is the GD module that fails to read the file. Strange, since it shoud read jpeg without problems. I eliminated the message about undefined value. If i have the ~Cover01.jpg file I can try to see why it fails. But as you say, it is not a show stopper.

Jiiri
01-07-2008, 04:40 PM
Downloading the latest version is causing me some issues. The link at http://www.ida.liu.se/~tompe/mobiperl/ sends me to http://www.ida.liu.se/~tompe/mobiperl/downloads/, which is the same page. When I click on the link there, it takes me to
http://www.ida.liu.se/~tompe/mobiperl/downloads/downloads
Which is broken.

As soon as I can get it v0.23, I'll test the new work you've done and see if it works on the Kindle. Thanks, Tompe.

tompe
01-07-2008, 05:10 PM
Downloading the latest version is causing me some issues. The link at http://www.ida.liu.se/~tompe/mobiperl/ sends me to http://www.ida.liu.se/~tompe/mobiperl/downloads/, which is the same page. When I click on the link there, it takes me to
http://www.ida.liu.se/~tompe/mobiperl/downloads/downloads
Which is broken.

As soon as I can get it v0.23, I'll test the new work you've done and see if it works on the Kindle. Thanks, Tompe.

Fixed! I had copied the HTML files to the wrong directory.

JeffElkins
01-07-2008, 05:17 PM
It is the GD module that fails to read the file. Strange, since it shoud read jpeg without problems. I eliminated the message about undefined value. If i have the ~Cover01.jpg file I can try to see why it fails. But as you say, it is not a show stopper.

Here's an example attached. It's strange, gimp opens the file w/o problems.

Edit: I had to rename the attachment from ~Cover.jpg to Cover.gif before it would attach...

Jiiri
01-07-2008, 05:19 PM
thanks, it's working - I have to wait for the windows version - not sure what to do with the .tar.

tompe
01-07-2008, 05:43 PM
Here's an example attached. It's strange, gimp opens the file w/o problems.

Edit: I had to rename the attachment from ~Cover.jpg to Cover.gif before it would attach...

I do not think the attachement worked. Which version of MobiPerl are you using. I seem to remember having problem with images looking like this but recently they seem to have worked. I assume that you do not want that image as cover anyway so it does not matter.

tompe
01-07-2008, 05:45 PM
thanks, it's working - I have to wait for the windows version - not sure what to do with the .tar.

You have to install ActivePerl according to the the MobiPerl web page... I probably do a new Windows version in a couple of days.

JeffElkins
01-08-2008, 07:25 AM
I do not think the attachement worked. Which version of MobiPerl are you using. I seem to remember having problem with images looking like this but recently they seem to have worked. I assume that you do not want that image as cover anyway so it does not matter.

I can access the attachment in the post above fine. It's just renamed Cover.gif from ~Cover.jpg. - The problem is that when Mobiperl hits one of these images it crashes and refuses to convert the book. I'm batch converting about 1000 .lit books and this problem hits just every so often. It does not occur when using convertlit to create an .oebzip file.

tompe
01-08-2008, 07:40 AM
I can access the attachment in the post above fine. It's just renamed Cover.gif from ~Cover.jpg. - The problem is that when Mobiperl hits one of these images it crashes and refuses to convert the book. I'm batch converting about 1000 .lit books and this problem hits just every so often. It does not occur when using convertlit to create an .oebzip file.

I think I have fixed so it does not crach and this fix is inlcuded in the latest tar version. It is not included in the latest Windows binary version.

When I clicked on the attachment nothing happened and when i looked at the icon it was a gif file so i assume that the icon is not the file. But I will try again...

Jiiri
01-08-2008, 07:44 AM
tompe, the site is doing the same thing I mentioned earlier - looping me into the downloads folder. Thanks!

Jiiri

tompe
01-08-2008, 07:51 AM
tompe, the site is doing the same thing I mentioned earlier - looping me into the downloads folder. Thanks!


Sorry, fixed now. Obviously I did some mistake when editing the Makefile to fix this. Next time I do a new version I will check that it works...

JeffElkins
01-08-2008, 09:16 AM
I think I have fixed so it does not crach and this fix is inlcuded in the latest tar version. It is not included in the latest Windows binary version.

When I clicked on the attachment nothing happened and when i looked at the icon it was a gif file so i assume that the icon is not the file. But I will try again...

I'll test the new version (.23?), and if you right click on the attachment you should be able to save it.

Thanks for your hard work!

tompe
01-08-2008, 09:22 AM
I'll test the new version (.23?), and if you right click on the attachment you should be able to save it.


There is a 0.0.24 also that will give some better information about files using mobi2mobi. I will do some more enhancements and probaly do a 0.0.25 with Windows binary maybe tomorrow.

I right clicked on the attachment and did save image but that gave me a gif file.

JeffElkins
01-08-2008, 09:51 AM
There is a 0.0.24 also that will give some better information about files using mobi2mobi. I will do some more enhancements and probaly do a 0.0.25 with Windows binary maybe tomorrow.

I right clicked on the attachment and did save image but that gave me a gif file.

Yes. As I said, I was forced to rename the file from ~Cover01.jpg to Cover01.gif in order to upload it. Rename the downloaded file to ~Cover01.jpg.

tompe
01-08-2008, 10:07 AM
Yes. As I said, I was forced to rename the file from ~Cover01.jpg to Cover01.gif in order to upload it. Rename the downloaded file to ~Cover01.jpg.

I meant that it is really a Gif file:

fossum:~> file Cover.jpg
Cover.jpg: GIF image data, version 89a, 99 x 132


So I think I am saving the "icon" and not the uploaded image.

JeffElkins
01-08-2008, 03:12 PM
I meant that it is really a Gif file:

fossum:~> file Cover.jpg
Cover.jpg: GIF image data, version 89a, 99 x 132


So I think I am saving the "icon" and not the uploaded image.

Nope! I just checked the original "jpg" file and it reports as a GIF. Could that be the problem? A routine looking for a jpg then crashing when it encouters a gif?

tompe
01-08-2008, 04:07 PM
Nope! I just checked the original "jpg" file and it reports as a GIF. Could that be the problem? A routine looking for a jpg then crashing when it encouters a gif?

The filename should not matter. I just tested on a Linux machine and it did not seem to matter. The GD library should read both GIF and JPEG. Maybe the problem is that the ~ gets stripped or something. On what platform did you have the problem?

And you did not run conversions in parallell?

tompe
01-09-2008, 07:56 PM
I did a version 0.0.25 with Windows binaries. I learned the zip files was easier to use than rar files so the Windows binaries are now packed in a zip file.

http://www.ida.liu.se/~tompe/mobiperl/


Changes in 0.0.25:

Added all language codes and added info about language in mobi2mobi

Changes in 0.0.24:


Added flag --boktype to mobi2mobi to change the booktype value. 2 = BOOK (default), 257 = NEWS (Wall Street Journal has this value)

Added flag --delexthtype that can be used to remove type mobi2mobi file.awz --outfile t.mobi --delexthtype cdetype

Fixed so that lit2mobi does not crach when failing to open an image file.

Changes in 0.0.23:

Item 401 in EXTH can be 1 or 4 butes according to examples (awz files). Removed assumption that it was 4 bytes to get mobi2mobi to work for an example file.

Jaapjan
01-10-2008, 07:23 AM
I am curious actually if you wrote the decompression / decryption for the records after the EXTH / PDB record 0 yourself or if you let Perl do that for you? Or did Mobipocket make something special out of it?

Maybe you'll induldge me and also tell me how you decide the actual amount of PDB records needed to be decompressed for the content since the last three(?) pdb records clearly aren't part of the content itself. They're way too small for that.

And.. why would they split the content in such small blocks of 2000 characters or less? Easier handling for small mobile devices?

tompe
01-10-2008, 12:11 PM
I am curious actually if you wrote the decompression / decryption for the records after the EXTH / PDB record 0 yourself or if you let Perl do that for you? Or did Mobipocket make something special out of it?


The Perl modules Palm::PDB and Palm::Doc takes care of the compression and the decompression but this will not work for highest compression because this compression is a secret MobiPocket scheme. I had to overide some code in one of these modules for it to work on some DRM:ed files (also decompress if version was 5).


Maybe you'll induldge me and also tell me how you decide the actual amount of PDB records needed to be decompressed for the content since the last three(?) pdb records clearly aren't part of the content itself. They're way too small for that.


The Palm Doc header tells how many records there are in the document. I think that the last two small records that are there sometimes and images are just add ons to the original format and these add ons are not compressed.


And.. why would they split the content in such small blocks of 2000 characters or less? Easier handling for small mobile devices?

I have kind of wondered about that also. The number of records for my Oxford Concise Dictionary was ridiculous. Maybe it speeds up the searching or something like that. If you have one word per record for example...

Jaapjan
01-11-2008, 02:55 AM
The Perl modules Palm::PDB and Palm::Doc takes care of the compression and the decompression but this will not work for highest compression because this compression is a secret MobiPocket scheme. I had to overide some code in one of these modules for it to work on some DRM:ed files (also decompress if version was 5).

I found a document from 2002 or something akin to that which described the regular compression scheme used and I implemented those now, more or less. (Since I hardly have the Perl modules available. Non perl code.) I have yet to run into any odd Mobi-format.. but then, I have not been looking. Maybe when the code can do a little more.

The Palm Doc header tells how many records there are in the document. I think that the last two small records that are there sometimes and images are just add ons to the original format and these add ons are not compressed.
True, but when you start decompressing the data at PDB record 1 (0 being the one holding the PDB 0 header, Mobi Header and EXTH header, how do you know when to end that file. For that matter, to what file does the HTML you're decoding belong to anyway. The index?

I have kind of wondered about that also. The number of records for my Oxford Concise Dictionary was ridiculous. Maybe it speeds up the searching or something like that. If you have one word per record for example...

Perhaps memory constrained devices read these blocks in memory as sort of cache and move to the next and / or previous one only when needed.

tompe
01-11-2008, 08:08 AM
True, but when you start decompressing the data at PDB record 1 (0 being the one holding the PDB 0 header, Mobi Header and EXTH header, how do you know when to end that file. For that matter, to what file does the HTML you're decoding belong to anyway. The index?


It also holds some other data like a long title and DRM stuff.

There is only one chunk of text that is compressed so you decompress all the records from record 1 to record n_document_records and n_document_records is the data you find in the PDB record.

The Perl code doing the decompression is:

my $header = $recs->[0];
if( defined _parse_headerrec($header) ) {
# a proper Doc file should be fine, but if it's not Doc
# compression like some Mobi docs seem to be we want to
# bail early. Otherwise we end up with a huge stream of
# substr() errors and we _still_ don't get any content.
eval {
sub min { return ($_[0]<$_[1]) ? $_[0] : $_[1] \
}
my $maxi = min($#$recs, $header->{'records'});
for( my $i = 1; $i <= $maxi; $i ++ ) {
$body .= _decompress_record( $h\
eader->{'version'},
$recs->[$i]->{'data'} )\
;
}
};
return undef if $@;
}


# algorithm taken from makedoc7.cpp with reference to
# http://patb.dyndns.org/Programming/PilotDoc.htm and
# http://www.pyrite.org/doc_format.html
sub _decompress_record($$) {
my ($version,$in) = @_;
return $in if $version == DOC_UNCOMPRESSED;

my $out = '';

my $lin = length $in;
my $i = 0;
while( $i < $lin ) {
my $ch = substr( $in, $i ++, 1 );
my $och = ord($ch);

if( $och >= 1 and $och <= 8 ) {
# copy this many bytes... basically a way to 'escape' d\
ata
$out .= substr( $in, $i, $och );
$i += $och;
} elsif( $och < 0x80 ) {
# pass through 0, 9-0x7f
$out .= $ch;
} elsif( $och >= 0xc0 ) {
# 0xc0-0xff are 'space' plus ASCII char
$out .= ' ';
$out .= chr($och ^ 0x80);
} else {
# 0x80-0xbf is sequence from already decompressed buffe\
r
my $nch = substr( $in, $i ++, 1 );
$och = ($och << 8) + ord($nch);
my $m = ($och & 0x3fff) >> 3;
my $n = ($och & 0x7) + 3;

# This isn't very perl-like, but a simple
# substr($out,$lo-$m,$n) doesn't work.
my $lo = length $out;
for( my $j = 0; $j < $n; $j ++, $lo ++ ) {
die "bad Doc compression" unless ($lo-$m) >= 0;
$out .= substr( $out, $lo-$m, 1 );
}
}
}

return $out;
}

Jaapjan
01-13-2008, 12:43 PM
Thanks to you as well as a few palm documentation files from 2001 (and lots of use of a hex editor, ultraedit) I managed to make some prototype C# code that reads out all the raw HTML content. To share with you some information, if you like, it is actually list this:

Palm header
Palm record index
MOBI header (Kind of obvious)
EXTH header (A dictionary format set of information about the book)
Content (Compressed)
Images (Uncompressed)
FLIS header (license information)
FCIS header (images information)

I remain unsure on how to determine where the images start and how long they are. Nor do I know the 2 byte record between content & images nor the 4 byte one at the end. Maybe some sort of checksum.

tompe
01-13-2008, 01:01 PM
Thanks to you as well as a few palm documentation files from 2001 (and lots of use of a hex editor, ultraedit) I managed to make some prototype C# code that reads out all the raw HTML content. To share with you some information, if you like, it is actually list this:

Palm header
Palm record index
MOBI header (Kind of obvious)
EXTH header (A dictionary format set of information about the book)
Content (Compressed)
Images (Uncompressed)
FLIS header (license information)
FCIS header (images information)

I remain unsure on how to determine where the images start and how long they are. Nor do I know the 2 byte record between content & images nor the 4 byte one at the end. Maybe some sort of checksum.

In my EXTH.pm I have tried to document as much as I have learned about the possible information in the EXTH. You can also have DRM information before or after the EXTH.

Are you sure that FLIS anc FCIS is license information?

I think the record format contains information about how long the record is.

I think that the first image record index is in MOBI+0x5B. When I decode a file I just check each record to see if it is an image.

Jaapjan
01-13-2008, 01:09 PM
In my EXTH.pm I have tried to document as much as I have learned about the possible information in the EXTH. You can also have DRM information before or after the EXTH.

Are you sure that FLIS anc FCIS is license information?

I think the record format contains information about how long the record is.

I think that the first image record index is in MOBI+0x5B. When I decode a file I just check each record to see if it is an image.

MOBI +0x5B for the image start? What if you have content much shorter then 0x5B records?

Actually the length of the content can simply be read from the header after which the images start. There's a 2 byte header in between. As for the images, do you also assume each image is 2 records long? Or get the length from elsewhere?

No, FCIS isn't about license information. It contains information about the images & the content size at lease.

JeffElkins
01-13-2008, 01:31 PM
lit2mobi lembert_02_-_Stranglers_Moon.lit
Unpack file lembert_02_-_Stranglers_Moon.lit in dir ctmp
+---[ ConvertLIT (Version 1.8) ]---------------[ Copyright (c) 2002,2003 ]---
ConvertLIT comes with ABSOLUTELY NO WARRANTY; for details
see the COPYING file or visit "http://www.gnu.org/license/gpl.html".
This is free software, and you are welcome to redistribute it under
certain conditions. See the GPL license for details.
LIT INFORMATION.........
DRM = 1
Timestamp = 6ac89519
Creator = 00000000
Language = 00000409
Writing out "d'Alembert_2_-_Stranglers_Moon" as "d'Alembert 2 - Stranglers Moon.htm" ...
Successfully written to "ctmp/d'Alembert 2 - Stranglers Moon.htm".

Writing out "RW_~Cover01" as "~Cover01.jpg" ...
Successfully written to "ctmp/~Cover01.jpg".

Writing out "RW_~Cover02" as "~Cover02.jpg" ...
Successfully written to "ctmp/~Cover02.jpg".

Writing out "RW_~Cover03" as "~Cover03.jpg" ...
Successfully written to "ctmp/~Cover03.jpg".

Writing out "RW_~Cover04" as "~Cover04.jpg" ...
Successfully written to "ctmp/~Cover04.jpg".

Writing out "RW_~Cover05" as "~Cover05.jpg" ...
Successfully written to "ctmp/~Cover05.jpg".

Exploded "lembert_02_-_Stranglers_Moon.lit" into "ctmp/".
Read in HTML tree from opf
Opf: Initialize from file: lembert_02_-_Stranglers_Moon.opf
CONTENT: <?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE package
PUBLIC "+//ISBN 0-9673008-1-9//DTD OEB 1.0.1 Package//EN"
"http://openebook.org/dtds/oeb-1.0.1/oebpkg101.dtd">
<package unique-identifier="OverDriveGUID">
<metadata>
<dc-metadata xmlns:dc="http://purl.org/dc/elements/1.0/" xmlns:oebpackage="http://openebook.org/namespaces/oeb-package/1.0/">
<dc:Title>d'Alembert 2 - Stranglers Moon</dc:Title>
<dc:Identifier id="OverDriveGUID" scheme="GUID">{6ECBF068-47B6-49B7-838E-CB056DA516B7}</dc:Identifier>
</dc-metadata>
<x-metadata>
<meta name="rwver-ReaderWorks-SDK-Control" content="2, 0, 2, 0215 (02/15/2002)" />
<meta name="rwver-HTML-Input-Filter" content="2, 0, 2, 0215 (02/15/2002)" />
<meta name="rwver-Image-Input-Filter" content="2, 0, 2, 0215 (02/15/2002)" />
<meta name="rwver-Text-Input-Filter" content="2, 0, 2, 0215 (02/15/2002)" />
<meta name="rwver-Word-Doc-Input-Filter" content="2.0.2.0215 (02/15/2002)" />
<meta name="rwver-LIT-file-generator" content="1.5.1.0280 (10/05/2000)" />
<meta name="rw-License-Key" content="RWPTL" />
</x-metadata>
</metadata>
<manifest>
<item id="d'Alembert_2_-_Stranglers_Moon" href="d'Alembert 2 - Stranglers Moon.htm" media-type="text/html" />
<item id="RW_~Cover01" href="~Cover01.jpg" media-type="image/jpeg" />
<item id="RW_~Cover02" href="~Cover02.jpg" media-type="image/jpeg" />
<item id="RW_~Cover03" href="~Cover03.jpg" media-type="image/jpeg" />
<item id="RW_~Cover04" href="~Cover04.jpg" media-type="image/jpeg" />
<item id="RW_~Cover05" href="~Cover05.jpg" media-type="image/jpeg" />
</manifest>
<spine>
<itemref idref="d'Alembert_2_-_Stranglers_Moon" />
</spine>
<guide>
<reference type="other.ms-thumbimage-standard" href="~Cover01.jpg" />
<reference type="other.ms-coverimage-standard" href="~Cover02.jpg" />
<reference type="other.ms-titleimage-standard" href="~Cover03.jpg" />
<reference type="other.ms-thumbimage" href="~Cover04.jpg" />
<reference type="other.ms-coverimage" href="~Cover05.jpg" />
</guide>
</package>
OPF: TITLE: d'Alembert 2 - Stranglers Moon
OPF: CREATOR:
Init from manifest
d'Alembert_2_-_Stranglers_Moon - d'Alembert 2 - Stranglers Moon.htm - text/html
RW_~Cover01 - ~Cover01.jpg - image/jpeg
Could not read image file: ~Cover01.jpg
RW_~Cover02 - ~Cover02.jpg - image/jpeg
Could not read image file: ~Cover02.jpg
RW_~Cover03 - ~Cover03.jpg - image/jpeg
Could not read image file: ~Cover03.jpg
RW_~Cover04 - ~Cover04.jpg - image/jpeg
Could not read image file: ~Cover04.jpg
RW_~Cover05 - ~Cover05.jpg - image/jpeg
Could not read image file: ~Cover05.jpg
Warning, RW_~Cover01 missing from spine, adding
Warning, RW_~Cover02 missing from spine, adding
Warning, RW_~Cover03 missing from spine, adding
Warning, RW_~Cover04 missing from spine, adding
Warning, RW_~Cover05 missing from spine, adding
Init from guide
OPFTITLE: d'Alembert 2 - Stranglers Moon
OPFAUTHOR:
Coverimage: ~Cover02.jpg
SPINE: adding d'Alembert_2_-_Stranglers_Moon - d'Alembert 2 - Stranglers Moon.htm - text/html
Adding: d'Alembert 2 - Stranglers Moon.htm - d'Alembert_2_-_Stranglers_Moon
+++.+SPINE: adding RW_~Cover01 - ~Cover01.jpg - image/jpeg
SPINE: adding RW_~Cover02 - ~Cover02.jpg - image/jpeg
SPINE: adding RW_~Cover03 - ~Cover03.jpg - image/jpeg
SPINE: adding RW_~Cover04 - ~Cover04.jpg - image/jpeg
SPINE: adding RW_~Cover05 - ~Cover05.jpg - image/jpeg
All spine elements have been added
Have Read in HTML tree from opf
Saving mobi file (version 4): lembert_02_-_Stranglers_Moon.mobi
COVEROFFSET: 0
THUMBOFFSET: 1
EXTH setting data: author - 100 - - 0x
EXTH add: author - 100 -
EXTH setting data: coveroffset - 201 - 0 - 0x30
EXTH add: coveroffset - 201 - 0 - 0x30
EXTH setting data: thumboffset - 202 - 1 - 0x31
EXTH add: thumboffset - 202 - 1 - 0x31
MOBIHDR: imgrecpointer: 112
EXTH setting data: author - 100 - - 0x
EXTH add: author - 100 -
EXTH setting data: coveroffset - 201 - 0 - 0x30
EXTH add: coveroffset - 201 - 0 - 0x30
EXTH setting data: thumboffset - 202 - 1 - 0x31
EXTH add: thumboffset - 202 - 1 - 0x31
New record for image 112: ~Cover02.jpg
Reading data from file: ~Cover02.jpg
[Image::BMP] ERROR: Not a bitmap: [~Cover02.jpg] at /usr/local/bin/MobiPerl/Util.pm line 486


I'm still finding that lit2mobi chokes on a certain percentage of books I'm trying to convert. The common denominator seems to be that they all were processed by Microsoft Word (all the jpegs are MS Word Graphics). Rather than just crash, could lit2mobi call html2mobi to try to process the html file in ctmp or something? These crashes play hob with batch processing.

Edit: This was with version .25

tompe
01-13-2008, 01:32 PM
MOBI +0x5B for the image start? What if you have content much shorter then 0x5B records?


In the address MOBI+0x5C (wrote B by mistake, that is byte 0x5C-0x5F in the MOBI header) you have four bytes that are the index for the first image record.


As for the images, do you also assume each image is 2 records long? Or get the length from elsewhere?


Each image can only be one record. That is the reason for the 64K limit on images.


No, FCIS isn't about license information. It contains information about the images & the content size at lease.

How do you know what FLIS and FCIS are about?

Jaapjan
01-13-2008, 01:36 PM
How do you know what FLIS and FCIS are about?

Well, the FCIS I know because it contains a field with the content length as well as a field that indicates the number of images available in the file. It also grows and shrinks depending on how many image files are present in the file so it suggests it is something of an information record where most information is for the images.

I really need more mobipocket files before I can be sure about the FLIS because I actually only have a batch of DRM'less versions I test with. However it is a static sized record with remarkably much 0x0's in it and 0xFFFFFFFF's suggesting unused information in the record. And since all the other relevant information is elsewhere it seems very probable it is for the DRM. As mentioned, I ened a DRM'ed file to be sure.

tompe
01-13-2008, 01:41 PM
Well, the FCIS I know because it contains a field with the content length as well as a field that indicates the number of images available in the file. It also grows and shrinks depending on how many image files are present in the file so it suggests it is something of an information record where most information is for the images.

OK. But this record is not obligatory. Maybe it is some kind of optimization.

Which record is cover image and which is thumb nail is given by data in EXTH.

Jaapjan
01-13-2008, 01:43 PM
OK. But this record is not obligatory. Maybe it is some kind of optimization.

Which record is cover image and which is thumb nail is given by data in EXTH.

Every file I have seen so far, including the creator, make these two records as well.

But it is interesting that you say they're optional. That means there must be an indication somewhere if they're included or not.

tompe
01-13-2008, 02:01 PM
Every file I have seen so far, including the creator, make these two records as well.

But it is interesting that you say they're optional. That means there must be an indication somewhere if they're included or not.

I have not generated them and every file I have generated works on reading devices. Why must there be an indication if they are included? You just have to check the first bytes of every record that is not content to see if they are there.

I would be more than happy to generate these record if I manage to find out the format of them and what they are for.

alexxxm
01-14-2008, 02:48 AM
have you ever thought about implementing a convertion to the Sony LRF format in your suite? I'd really like to write it myself, but that format is not well documented, and/or relies on some Windows DLL...

Alessandro

Jaapjan
01-14-2008, 04:22 AM
have you ever thought about implementing a convertion to the Sony LRF format in your suite? I'd really like to write it myself, but that format is not well documented, and/or relies on some Windows DLL...
Alessandro

Personally I am hardly at that stage. I am just doing some hobby programming left and right and my current interest lies in what tompe is already doing.

Currently he is more often right then I am. But he might want to do some Sony code into his program. Though aren't there any LRF projects that convert to HTML? From HTML it is easy to get to Mobipocket.

Speaking of which,

What kind of testbatch of files do you use Tompe? Can you provide me with URL's to them? I am affraid you were right about the images too. Maybe anyway. I discovered that there are many more images in the file then I thought (also JPG's, not only GIF's).

Back to the Mansion! :bookworm:

tompe
01-14-2008, 06:08 AM
have you ever thought about implementing a convertion to the Sony LRF format in your suite? I'd really like to write it myself, but that format is not well documented, and/or relies on some Windows DLL...


That is already done by kovidgoyal in his libprs500. This is documented in threads in the Content sub-section in the Sony Portable Reader forum here.
I started writing Mobiperl inspired by libprs500.

tompe
01-14-2008, 06:12 AM
What kind of testbatch of files do you use Tompe? Can you provide me with URL's to them? I am affraid you were right about the images too. Maybe anyway. I discovered that there are many more images in the file then I thought (also JPG's, not only GIF's).


I use for example files found here (the Dickens books with many images are good test books) and I also generated some files with mobigen and I used the Alice in Wonderland example from MobiPockets web page. I have also got some files from people like awz files but these are probably not OK to put up on a web site.

JeffElkins
01-15-2008, 08:22 AM
Tompe, once a .lit is converted to .mobi is there a way to break the .mobi into its component parts? A mobi2html or something?

DMcCunney
01-15-2008, 08:26 AM
Tompe, once a .lit is converted to .mobi is there a way to break the .mobi into its component parts? A mobi2html or something?Yes.

http://www.ida.liu.se/~tompe/mobiperl/mobi2html.html
______
Dennis

JeffElkins
01-15-2008, 11:18 AM
Yes.

http://www.ida.liu.se/~tompe/mobiperl/mobi2html.html
______
Dennis

Duh. Missed that, thanks.

tompe
01-15-2008, 11:29 AM
And you now that lit2mobi uses clit internally and the clit exploded files are saved in ctmp in the current dir. When you do the next conversion the ctmp directory is reused. I did it that way because I like to have temporary files left after a conversion because if something went wrong it is easier to debug. Especially it is easier to debug via email...

TallMomof2
01-15-2008, 10:11 PM
I just spent the afternoon and evening converting many, many .lit and .mobi files so that they work nicely on my Kindle. I really like the --prefixtitle for series. Very nice addition. Only problem I found was with files that had spaces in the name. All I had to do was edit the file name and remove the spaces and the files worked just fine.

Many thanks!

Now I want to learn how to add a GUI to your program.....

HarryT
01-16-2008, 02:39 AM
Filenames with spaces should work fine if you put double quotes around the entire filename.

tompe
01-16-2008, 10:49 AM
Filenames with spaces should work fine if you put double quotes around the entire filename.

Yes, that should work. Or you can escape space with \ at least in a Unix system. Please let me know if this do not work and how it fails.

TallMomof2
01-16-2008, 09:40 PM
I only had a handful of files with spaces and already edited them to no spaces so I can't try the double quotes.

I have found many .prc and I think .mobi (not sure) files that do not accept the new titles and authors or at least they don't display on the Kindle. However, if I convert the same book from .lit I don't have that problem. In the future I will only buy .lit whenever possible.

JeffElkins
01-18-2008, 04:03 PM
Hello,

I'm trying to add a coverimage to an existing .mobi file:

mobi2mobi mymobi.mobi --coverimage image.jpg mymobi.mobi

Have I got the syntax wrong?

wallcraft
01-18-2008, 04:16 PM
Try:

mobi2mobi --coverimage image.jpg --outfile newmobi.mobi mymobi.mobi

tompe
01-18-2008, 04:19 PM
Hello,

I'm trying to add a coverimage to an existing .mobi file:

mobi2mobi mymobi.mobi --coverimage image.jpg mymobi.mobi

Have I got the syntax wrong?

Yes.

mobi2mobi mymobi.mobi --coverimage image.jpg --outfile newmymobi.mobi

is more correct. If you run non-binary versions you can do "perldoc mobi2mobi" to see the documentation. Otherwise you have to look at the web page.

JeffElkins
01-18-2008, 05:45 PM
Try:

mobi2mobi --coverimage image.jpg --outfile newmobi.mobi mymobi.mobi


Nope, didn't work.

JeffElkins
01-18-2008, 05:47 PM
Yes.

mobi2mobi mymobi.mobi --coverimage image.jpg --outfile newmymobi.mobi

is more correct. If you run non-binary versions you can do "perldoc mobi2mobi" to see the documentation. Otherwise you have to look at the web page.

Nor did this...

tompe
01-18-2008, 05:58 PM
Nope, didn't work.

If you do just mobi2mobi on the input file what is the output? When you say it did not work does it mean that you got an error message or just that you did not see the image? If you do "mobi2html file.mobi unpackdir" is the coverimage then in the unpackdir?

If you send me the file I will test to add a cover image and fix eventual bugs. Send it in a private message here or visa email to tpe at ida.liu.se

JeffElkins
01-18-2008, 06:12 PM
If you do just mobi2mobi on the input file what is the output? When you say it did not work does it mean that you got an error message or just that you did not see the image? If you do "mobi2html file.mobi unpackdir" is the coverimage then in the unpackdir?

If you send me the file I will test to add a cover image and fix eventual bugs. Send it in a private message here or visa email to tpe at ida.liu.se


jpg --outfile newmymobi.mobi
Database Name: 01_Master_and_Commander.html
Version: 0
Type: BOOK
Creator: MOBI
Seed: 293
Resdb:
AppInfoDirty:
ctime: 1200688185 - Fri Jan 18 15:29:45 2008
mtime: 1200688201 - Fri Jan 18 15:30:01 2008
baktime: -2082844800 - Thu Dec 31 19:00:00 1903
---------------------------------------------------
FIRST IMG Record 0
---------------------------------------------------
Image record index: 265 (477 x 800)
IMAGE INDEX: 265
PDHEADER Version: 2
PDHEADER Length: 1078340
PDHEADER NRecords: 264
PDHEADER Recsize: 4096
PDHEADER Unknown: 0
MOBIHEADER ciflg: 65535
MOBIHEADER ciptr: 65535
MOBIHEADER doctype: MOBI
MOBIHEADER length: 228
MOBIHEADER booktype: 2 - BOOK
MOBIHEADER codep: 1252
MOBIHEADER uniqid: 2968844172
MOBIHEADER ver: 4
MOBIHEADER exthflg: 80
MOBIHEADER language: 9 - 9 - 0 - ENGLISH -
COVEROFFSET:
EXTH doctype: EXTH
EXTH length: 52
EXTH n_items: 2
EXTH item: 100 - Author - 18 - Unspecified Author
EXTH item: 201 - CoverOffset - 4 - 0x0000
LONGTITLE: 01_Master_and_Commander.html
-----------------
Setting record 265 to 01_cover.jpg
Reading data from file: 01_cover.jpg
Setting extended header data: coveroffset - 0
EXTH setting data: coveroffset - 201 - 0 - 0x30
EXTH replacing data: 201 - 0 - 0x30
GETSTRING: Author - Unspecified Author
CoverOffset - not printable


PM on the way...

tompe
01-18-2008, 08:21 PM
I think I need the file to fix this (haven't got it yet...).

JeffElkins
01-18-2008, 08:59 PM
I think I need the file to fix this (haven't got it yet...).

http://www.elkins.org/attach.zip

You can download it from the link above. The attachment wouldn't take via a PM.

tompe
01-18-2008, 09:16 PM
You can download it from the link above. The attachment wouldn't take via a PM.

Well, there is a bug in my code. Will fix it and fix some other bugs and make a new release during the weekend.

JeffElkins
01-18-2008, 09:26 PM
Well, there is a bug in my code. Will fix it and fix some other bugs and make a new release during the weekend.

Thanks for the hard work you're putting into this.

tompe
01-18-2008, 10:26 PM
Thanks for the hard work you're putting into this.

It is still fun :) I found two bugs with your test example so a good test case. It is working now adding cover image. I will fix the author/image bug for files without an EXTH header tomorrow and then make a new release.

JeffElkins
01-19-2008, 11:33 AM
It is still fun :) I found two bugs with your test example so a good test case. It is working now adding cover image. I will fix the author/image bug for files without an EXTH header tomorrow and then make a new release.

Two more bugs I've encountered:

1. lit2mobi crashes when saving .lit file: http://www.elkins.org/example1.lit

2. lit2mobi completes conversion, but book has no carriage returns or paragraph breaks: http://www.elkins.org/example2.lit

Both the original lit files seem fine using Microsoft Reader.

tompe
01-19-2008, 12:21 PM
Two more bugs I've encountered:

1. lit2mobi crashes when saving .lit file: http://www.elkins.org/example1.lit

2. lit2mobi completes conversion, but book has no carriage returns or paragraph breaks: http://www.elkins.org/example2.lit

Both the original lit files seem fine using Microsoft Reader.

1. It does not crash for me. When I release the next version you can test it again and tell me if it crashes or not.

2. The problem is that Mobipocket does not support the <pre> tag. But I added my fix for that to lit2mobi so it look readable now.

JeffElkins
01-19-2008, 12:28 PM
1. It does not crash for me. When I release the next version you can test it again and tell me if it crashes or not.

2. The problem is that Mobipocket does not support the <pre> tag. But I added my fix for that to lit2mobi so it look readable now.

Thanks Tommy!

tompe
01-19-2008, 03:05 PM
Version 0.0.26 with Windows binaries is available from:

http://www.ida.liu.se/~tompe/mobiperl/


Changes in 0.0.26:


--prefixtitle now works for mobi2mobi and when input is a PalmDOC file.

Fixed bug with mobi2mobi and --coverimage. It did not work if the file did not contain any images.

Fixed rescaling bug. Now also images whose height is greater than 640 are rescaled. This is done because of a bug in the Gen3.

Fixed bug in mobi2mobi --fiximagesizes. In Linux the wrong tmp file was read.

Introduced a flag --gen3imagefix to mobi2mobi that can be used if you have a book that hang the Gen3. These kind of hangings are often caused by a large image in the book. This is a firmware bug in the Gen3.

Added call to fix_pre_tags in lit2mobi

Now it is possible to add author information to a Mobipocket file that does not have an EXTH.

JeffElkins
01-19-2008, 03:52 PM
Version 0.0.26 with Windows binaries is available from:

http://www.ida.liu.se/~tompe/mobiperl/


Changes in 0.0.26:


--prefixtitle now works for mobi2mobi and when input is a PalmDOC file.

Fixed bug with mobi2mobi and --coverimage. It did not work if the file did not contain any images.

Fixed rescaling bug. Now also images whose height is greater than 640 are rescaled. This is done because of a bug in the Gen3.

Fixed bug in mobi2mobi --fiximagesizes. In Linux the wrong tmp file was read.

Introduced a flag --gen3imagefix to mobi2mobi that can be used if you have a book that hang the Gen3. These kind of hangings are often caused by a large image in the book. This is a firmware bug in the Gen3.

Added call to fix_pre_tags in lit2mobi

Now it is possible to add author information to a Mobipocket file that does not have an EXTH.



lit2mobi example1.lit
Unpack file example1.lit in dir ctmp
+---[ ConvertLIT (Version 1.8) ]---------------[ Copyright (c) 2002,2003 ]---
ConvertLIT comes with ABSOLUTELY NO WARRANTY; for details
see the COPYING file or visit "http://www.gnu.org/license/gpl.html".
This is free software, and you are welcome to redistribute it under
certain conditions. See the GPL license for details.
LIT INFORMATION.........
DRM = 1
Timestamp = a6b86513
Creator = 00000058
Language = 00001009
Writing out "Farmer,_Philip_Jose_-_Riverworld_06_-_(_Shorts)_Tales_of_Riverworld" as "Farmer, Philip Jose - Riverworld 06 - ( Shorts) Tales of Riverworld.htm" ...
Successfully written to "ctmp/Farmer, Philip Jose - Riverworld 06 - ( Shorts) Tales of Riverworld.htm".

Writing out "RW_~Cover01" as "~Cover01.jpg" ...
Successfully written to "ctmp/~Cover01.jpg".

Writing out "RW_~Cover02" as "~Cover02.jpg" ...
Successfully written to "ctmp/~Cover02.jpg".

Writing out "RW_~Cover03" as "~Cover03.jpg" ...
Successfully written to "ctmp/~Cover03.jpg".

Writing out "RW_~Cover04" as "~Cover04.jpg" ...
Successfully written to "ctmp/~Cover04.jpg".

Writing out "RW_~Cover05" as "~Cover05.jpg" ...
Successfully written to "ctmp/~Cover05.jpg".

Exploded "example1.lit" into "ctmp/".
Read in HTML tree from opf
Opf: Initialize from file: example1.opf
CONTENT: <?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE package
PUBLIC "+//ISBN 0-9673008-1-9//DTD OEB 1.0.1 Package//EN"
"http://openebook.org/dtds/oeb-1.0.1/oebpkg101.dtd">
<package unique-identifier="OverDriveGUID">
<metadata>
<dc-metadata xmlns:dc="http://purl.org/dc/elements/1.0/" xmlns:oebpackage="http://openebook.org/namespaces/oeb-package/1.0/">
<dc:Title>Farmer, Philip Jose - Riverworld 06 - ( Shorts) Tales of Riverworld</dc:Title>
<dc:Creator role="aut">Philip Jose Farmer</dc:Creator>
<dc:Identifier id="OverDriveGUID" scheme="GUID">{54B56D18-7D64-42EB-AE2A-3D55AC2117DF}</dc:Identifier>
</dc-metadata>
<x-metadata>
<meta name="rwver-ReaderWorks-SDK-Control" content="1.0.1.817 (08/17/2001)" />
<meta name="rwver-HTML-Input-Filter" content="1.0.1.0817 (08/17/2001)" />
<meta name="rwver-Image-Input-Filter" content="1.0.1.817 (08/17/2001)" />
<meta name="rwver-Text-Input-Filter" content="1.0.1.817 (08/17/2001)" />
<meta name="rwver-LIT-file-generator" content="1.5.1.0280 (10/05/2000)" />
<meta name="rw-License-Key" content="RWPTL" />
</x-metadata>
</metadata>
<manifest>
<item id="Farmer,_Philip_Jose_-_Riverworld_06_-_(_Shorts)_Tales_of_Riverworld" href="Farmer, Philip Jose - Riverworld 06 - ( Shorts) Tales of Riverworld.htm" media-type="text/html" />
<item id="RW_~Cover01" href="~Cover01.jpg" media-type="image/jpeg" />
<item id="RW_~Cover02" href="~Cover02.jpg" media-type="image/jpeg" />
<item id="RW_~Cover03" href="~Cover03.jpg" media-type="image/jpeg" />
<item id="RW_~Cover04" href="~Cover04.jpg" media-type="image/jpeg" />
<item id="RW_~Cover05" href="~Cover05.jpg" media-type="image/jpeg" />
</manifest>
<spine>
<itemref idref="Farmer,_Philip_Jose_-_Riverworld_06_-_(_Shorts)_Tales_of_Riverworld" />
</spine>
<guide>
<reference type="other.ms-thumbimage-standard" href="~Cover01.jpg" />
<reference type="other.ms-coverimage-standard" href="~Cover02.jpg" />
<reference type="other.ms-titleimage-standard" href="~Cover03.jpg" />
<reference type="other.ms-thumbimage" href="~Cover04.jpg" />
<reference type="other.ms-coverimage" href="~Cover05.jpg" />
</guide>
</package>
OPF: TITLE: Farmer, Philip Jose - Riverworld 06 - ( Shorts) Tales of Riverworld
OPF: CREATOR: Philip Jose Farmer
Init from manifest
Farmer,_Philip_Jose_-_Riverworld_06_-_(_Shorts)_Tales_of_Riverworld - Farmer, Philip Jose - Riverworld 06 - ( Shorts) Tales of Riverworld.htm - text/html
RW_~Cover01 - ~Cover01.jpg - image/jpeg
Could not read image file: ~Cover01.jpg
RW_~Cover02 - ~Cover02.jpg - image/jpeg
Could not read image file: ~Cover02.jpg
RW_~Cover03 - ~Cover03.jpg - image/jpeg
Could not read image file: ~Cover03.jpg
RW_~Cover04 - ~Cover04.jpg - image/jpeg
Could not read image file: ~Cover04.jpg
RW_~Cover05 - ~Cover05.jpg - image/jpeg
Could not read image file: ~Cover05.jpg
Warning, RW_~Cover01 missing from spine, adding
Warning, RW_~Cover02 missing from spine, adding
Warning, RW_~Cover03 missing from spine, adding
Warning, RW_~Cover04 missing from spine, adding
Warning, RW_~Cover05 missing from spine, adding
Init from guide
OPFTITLE: Farmer, Philip Jose - Riverworld 06 - ( Shorts) Tales of Riverworld
OPFAUTHOR: Philip Jose Farmer
Coverimage: ~Cover02.jpg
SPINE: adding Farmer,_Philip_Jose_-_Riverworld_06_-_(_Shorts)_Tales_of_Riverworld - Farmer, Philip Jose - Riverworld 06 - ( Shorts) Tales of Riverworld.htm - text/html
Adding: Farmer, Philip Jose - Riverworld 06 - ( Shorts) Tales of Riverworld.htm - Farmer,_Philip_Jose_-_Riverworld_06_-_(_Shorts)_Tales_of_Riverworld
+++.+SPINE: adding RW_~Cover01 - ~Cover01.jpg - image/jpeg
SPINE: adding RW_~Cover02 - ~Cover02.jpg - image/jpeg
SPINE: adding RW_~Cover03 - ~Cover03.jpg - image/jpeg
SPINE: adding RW_~Cover04 - ~Cover04.jpg - image/jpeg
SPINE: adding RW_~Cover05 - ~Cover05.jpg - image/jpeg
All spine elements have been added
Have Read in HTML tree from opf
FIX PRE TAGS
Saving mobi file (version 4): example1.mobi
COVEROFFSET: 0
THUMBOFFSET: 1
EXTH setting data: author - 100 - Philip Jose Farmer - 0x5068696c6970204a6f7365204661726d6572
EXTH add: author - 100 - Philip Jose Farmer
EXTH setting data: coveroffset - 201 - 0 - 0x30
EXTH add: coveroffset - 201 - 0 - 0x30
EXTH setting data: thumboffset - 202 - 1 - 0x31
EXTH add: thumboffset - 202 - 1 - 0x31
MOBIHDR: imgrecpointer: 207
EXTH setting data: author - 100 - Philip Jose Farmer - 0x5068696c6970204a6f7365204661726d6572
EXTH add: author - 100 - Philip Jose Farmer
EXTH setting data: coveroffset - 201 - 0 - 0x30
EXTH add: coveroffset - 201 - 0 - 0x30
EXTH setting data: thumboffset - 202 - 1 - 0x31
EXTH add: thumboffset - 202 - 1 - 0x31
New record for image 207: ~Cover02.jpg
Reading data from file: ~Cover02.jpg - 510 x 680
[Image::BMP] ERROR: Not a bitmap: [~Cover02.jpg] at /usr/local/bin/MobiPerl/Util.pm line 488


Thanks for this update!

For me it still crashes on:

http://www.elkins.org/example1.lit
and
http://www.elkins.org/example3.lit

Both fail with a hard crash at /usr/local/bin/MobiPerl/Util.pm line 488

edit: http://www.elkins.org/example4.zip

Add coverimage fails:
mobi2mobi oldmobi.mobi --coverimage cover.jpg --outfile newmobi.mobi

When exploded, the coverimage .jpg has been included in the newmobi.mobi file


However, the globbed text seems to be corrected.

BTW, Think about setting up a paypal tip jar on your site. At the very least I'm sure you'll gain beer money...I know I owe you a case of your favorite :)

tompe
01-19-2008, 04:09 PM
Aha, this is a bug I have not managed to track down since I did not have a test file. It is also a bug that only occur in Windows. So next time I boot Windows I will try to fix it using your test files.

There is one thing you could test. Could you test to rename the file to a name without ~ and change the name in the opf file and then run opf2mobi and see if it works?

Otherwise you could just remove the image files and the reference to them and run opf2mobi since the image files are not related to the book.

JeffElkins
01-19-2008, 04:16 PM
Aha, this is a bug I have not managed to track down since I did not have a test file. It is also a bug that only occur in Windows. So next time I boot Windows I will try to fix it using your test files.

There is one thing you could test. Could you test to rename the file to a name without ~ and change the name in the opf file and then run opf2mobi and see if it works?

Otherwise you could just remove the image files and the reference to them and run opf2mobi since the image files are not related to the book.

Sure, I'll test this today. Also, I don't run Windows. All my testing is done with Kubuntu 7.10

EDIT: Just tested. Removing the ~ has no effect. lit2mobi still crashes. However, editing the .opf file and removing image file references allows opf2mobi to complete.

tompe
01-19-2008, 04:26 PM
Sure, I'll test this today. Also, I don't run Windows. All my testing is done with Kubuntu 7.10

Very good to know. It works for me using Debian unstable. Could it maybe be different versions of clit we are using? I use version 1.8.

The problem I think is that the reading of the file fails on line 486 in Util.pm. When that fails it tries to read it as a BMP file. But the file is a gif file with a jpg filename. So it should be read by "my $p = new GD::Image ("$filename");". What you could do is print out $filename at that point and confirm that it is the same as in the ctmp directory.

You could also write a simple perl script that just tries to read the problematic file and see if it works.

Since I do not manage to trigger this bug it is a bit hard for me to debug it...

ppxnouse
01-25-2008, 10:00 AM
Hello tompe,

I did create myself a small tool to download and parse/reformat books from the German Project Gutenberg. I write all content into a single HTML and create a "TOC" at the top of the page with inter page links.

When I convert this page using the Mobipocket Reader, The "TOC" works. When I use html2mobi, the links do not work. I do not want to split all into different pages if not necessary. Any other way to make a TOC with HTML2MOBI ? (I am sure it got written somewhere already - I just can not find it).

Thank you

tompe
01-25-2008, 10:54 AM
When I convert this page using the Mobipocket Reader, The "TOC" works. When I use html2mobi, the links do not work. I do not want to split all into different pages if not necessary. Any other way to make a TOC with HTML2MOBI ? (I am sure it got written somewhere already - I just can not find it).


This should work. Could you send me a test file (email: tpe@ida.liu.se).

kovidgoyal
02-07-2008, 05:22 PM
@tompe
Would you mind improving you mobi2html utility a little so I can write mobi2lrf

1) Allow passing of absolute paths to mobi2html for the .mobi file
2) Post process the generated HTML to remove mobipocket specific tags and attributes. For e.g. <mbp:page-break/> -> <br style="page-break-after:always/> and so on

tompe
02-07-2008, 06:25 PM
@tompe
Would you mind improving you mobi2html utility a little so I can write mobi2lrf

1) Allow passing of absolute paths to mobi2html for the .mobi file
2) Post process the generated HTML to remove mobipocket specific tags and attributes. For e.g. <mbp:page-break/> -> <br style="page-break-after:always/> and so on

Sound like reasonable things to do. I have put them on the top of the todo list so it will probably be done soon. But I just got my N810 today so I am kind of busy playing with it...

kovidgoyal
02-07-2008, 06:40 PM
Thanks, 'preciate it. Enjoy your N800

darkninja
02-08-2008, 06:26 PM
Check out Igor Skochinsky's blog!

At the bottom of the comments, there's a very interesting Python program.

It lets you use the same mobipocket .PRC file on multiple devices, even if it was previously locked to a specific device.

tompe
02-09-2008, 09:40 AM
@tompe
Would you mind improving you mobi2html utility a little so I can write mobi2lrf

1) Allow passing of absolute paths to mobi2html for the .mobi file
2) Post process the generated HTML to remove mobipocket specific tags and attributes. For e.g. <mbp:page-break/> -> <br style="page-break-after:always/> and so on

There is a new version 0.0.27 (no Windows binaries for this version).

1) Fixed.

2) I removed some of the tags and the attribute you mentioned. But I did not find a list of Mobipocket specific attributes. And can't you just ignore attributes you do not recognize?

Let me know if some more changes should be done to the html file. And a test file is always good to have when fixing these kind of things.

kovidgoyal
02-09-2008, 11:45 AM
Thanks.

Here's the list of custom tags
http://www.mobipocket.com/dev/article.asp?BaseFolder=prcgen&File=tagref_mobi.xml
Most of them seem to be pretty useless.

Yeah I can ignore unknown tags and attributes, but at least for the few that contain useful information, it's good to convert them.

tompe
02-09-2008, 01:23 PM
Thanks.

Here's the list of custom tags
http://www.mobipocket.com/dev/article.asp?BaseFolder=prcgen&File=tagref_mobi.xml
Most of them seem to be pretty useless.

Yeah I can ignore unknown tags and attributes, but at least for the few that contain useful information, it's good to convert them.

Now I just removed them but I will try to convert them to something useful. But that probably have to be an ongoing activity. When I have an example that gets better with conversion I will add it.

nrapallo
02-10-2008, 12:16 AM
Hi, tompe!

I have recently been converting .prc (mobipocket) files into .IMP format using your 'mobi2html' and my own perl script. Especially .prc with color illustrations and images.

I too have had to filter/change (I'm not complaining here) mobi specific tags, in particular:

1. My text editor opens the resulting .html as one-line (and word-wrapped). As a result, I like to replace '/div>' with '/div>\n' to get some line-breaks for visual purposes. Makes some things 'line-up' vertically to aid in error-correcting.

2. '<mbp: pagebreak' with '<p style="page-break-before:always' to avoid format-specific tags.

For personal reasons/preferences, I also change:

3. '<body' with '<body style="margin-left:2%; margin-right:2%; text-align:justify' to allow small margins and justified text. For those that do not want this, then it would be easy to change as this would appear in only one place, at the beginning of the file.

4. '<img align="baseline"' with '<img' as the eBook Publisher software I use strangely positions ALL images on the first page with the baseline keyword there.

Please note that to ensure the maximum conversion, I often leave out a leading '<' or trailing '>' as seen above. These replacements above work better by doing this.

By the way, would it be easy to incorporate my 'html2imp.pl' perl script within your 'Mobiperl' so that you can also produce .IMP format ebooks? You could call it 'mobi2imp'. Sorry, but I don't mean to lengthen your to-do list too much.:)

-Nick

DMcCunney
02-10-2008, 12:53 AM
1. My text editor opens the resulting .html as one-line (and word-wrapped).

-NickWhat text editor do you use? Sounds like a Windows (CRLF) vs *nix (LF) EOL char issue.
______
Dennis

tompe
02-10-2008, 06:36 AM
What text editor do you use? Sounds like a Windows (CRLF) vs *nix (LF) EOL char issue.


No, the HTML file is actually just one line. When you generate the HTML from a tree there is no natural places to put end of line so therefore I had not added any newlines. But now I added the suggestion to place a newline after /div>.

tompe
02-10-2008, 06:40 AM
By the way, would it be easy to incorporate my 'html2imp.pl' perl script within your 'Mobiperl' so that you can also produce .IMP format ebooks? You could call it 'mobi2imp'. Sorry, but I don't mean to lengthen your to-do list too much.:)


Probably. If you give me the script I will look at it.

tompe
02-10-2008, 08:19 AM
Verson 0.0.28 is available at

http://www.ida.liu.se/~tompe/mobiperl/

No Windows binaries for this version. I will do a binary version when more important things have changed or when somebody need the Window binaries.

Changes in 0.0.28:


Fixed the substituion of mbppagebreak that was done wrongly

Added a newline after /dev> in mobi2html

Fixed bug in MobiFile pointed out by Gary Tsang. The cover offset was set to 0 when no cover was available and that made the Kindle reader crash.

Fixed html2mobi so that file name is used as title in generated toc if no title tag is available.

Changes in 0.0.27:


mobi2html now works with the mobi file specified with a full path

MobiPocket specific html removed in mobi2html. --mobihtml to keep it.

JSWolf
02-10-2008, 08:24 AM
Verson 0.0.28 is available at

http://www.ida.liu.se/~tompe/mobiperl/

No Windows binaries for this version. I will do a binary version when more important things have changed or when somebody need the Window binaries.

Changes in 0.0.28:


Fixed the substituion of mbppagebreak that was done wrongly

Added a newline after /dev> in mobi2html

Fixed bug in MobiFile pointed out by Gary Tsang. The cover offset was set to 0 when no cover was available and that made the Kindle reader crash.

Fixed html2mobi so that file name is used as title in generated toc if no title tag is available.

Changes in 0.0.27:


mobi2html now works with the mobi file specified with a full path

MobiPocket specific html removed in mobi2html. --mobihtml to keep it.

Yes, I do need a new Windows binary. I do use html2mobi. So please come up with a new Windows edition for those of us using Windows. It isn't fair that Windows users get stuck with 2 versions behind.

tompe
02-10-2008, 08:32 AM
Yes, I do need a new Windows binary. I do use html2mobi. So please come up with a new Windows edition for those of us using Windows. It isn't fair that Windows users get stuck with 2 versions behind.

I will eventually but since there is a lot of trouble for me to produce the Windows version (I have to reboot my computer and loose all the Linux setup) I am not willing to do it too often. If you are using html2mobi and always have a cover image then there are no changes in version 0.0.28 that concerns you.

If somebody else would like to volonteer to produce Windows binaries please feel free to do so. My mobiperl web page give all the information needed to set it up and the Makefile in the distribution contains targets to build things in Windows using nmake.

nrapallo
02-10-2008, 08:33 AM
DMcCunney:

My text editor 'Textpad' does recognize the difference between PC and Unix line endings. As tompe points out, it is just one (long) line.


tompe:

For ease I list the perl script that converts .html to .IMP formats, here:

#!/perl/bin/perl -w
#
# Adapted by Nick Rapallo (January 2008)
#
# Modified code taken directly from "SBPubX.doc" (installed by the eBook Publisher
# software). Given a single .html it creates .opf project file for later use as well
# as .IMP for REB 1200; can change the latter to GEB/EBW 1150 or REB 1100 by
# uncommenting the {BuildTarget} lines below.

package main;
use Win32::OLE;
use Win32::OLE qw(EVENTS);
Win32::OLE->Initialize(Win32::OLE::COINIT_APARTMENTTHREADED);

$usage='Html2imp.pl Authorname Title Category htmlfilename';
die "Usage: $usage\n" if $#ARGV != 3;

################################################## #################
#
# get the interfaces, complain and quit if we cannot
#
$project = Win32::OLE->new("SBPublisher.Project") or
die "Unable to get IProject interface\n";

$builder = Win32::OLE->new("SBPublisher.Builder") or
die "Unable to get IBuilder interface\n";

# Setup the event handling.
#
Win32::OLE->WithEvents($builder, 'EventHandlers');

################################################## #################
#
# Create a new project and add our document file with optional cover.
#
$project->ClearAll();
#$project->AddSourceFile("cover.htm");
$project->AddSourceFile($ARGV[3]);


################################################## #################
#
# Set the various "metadata" items for the publication
#
$project->{AuthorFirstName} = $ARGV[0];
$project->{BookTitle} = $ARGV[1];
$project->{Category} = $ARGV[2];
#$project->{ISBN} = $project->CanonicalizeISBN("0448163004 ");
#$project->{BISAC} = "FIC004000";

################################################## #################
#
# Now build the OEBFF output
#
$project->{OutputDirectory} = ".";
$project->{Compress} = 1; #True
$project->{Encrypt} = 0; #False
$project->{KeepAnchors} = 1; #True
$project->{Language} = "en";
$project->{RequireISBN} = 0; #False
$project->{Zoom} = 2;

################################################## #################
#
# Now build the REB 1200 (FullVga) .IMP output
#$project->{BookFileName} = $ARGV[3] . "_1200";
#$project->{BookFileName} = $ARGV[0] . " - " . $ARGV[1] . "_1200";
#$project->Save($ARGV[3] . "_1200.opf");
#$project->Save($ARGV[0] . " - " . $ARGV[1] . "_1200.opf");
#
#$project->{BuildTarget} = 1;
#
# Now generate both the OEBFF and/or .IMP output
#$builder->GenerateOEBFF($project, 1);
#$builder->Build($project);
#if (Win32::OLE->LastError() != 0) {
# print "ERROR: GenerateOEBFF/Build method failed for REB 1200.\n";
#} else {
# print "REB 1200 ebook created!\n";
#}

################################################## #################
#
# Now (optionally) build the EBW/GEB 1150 (gray HalfVga) .IMP output
#
#$project->{BookFileName} = $ARGV[3];
$project->{BookFileName} = $ARGV[0] . " - " . $ARGV[1];
#$project->Save($ARGV[3] . ".opf");
$project->Save($ARGV[0] . " - " . $ARGV[1] . ".opf");
#
$project->{BuildTarget} = 2;
#
# Now generate both the OEBFF and/or .IMP output
#$builder->GenerateOEBFF($project, 1);
$builder->Build($project);
if (Win32::OLE->LastError() != 0) {
print "ERROR: GenerateOEBFF/Build method failed for EBW 1150.\n";
} else {
print "EBW 1150 ebook created!\n";
}

################################################## #################
#
# Now (optionally) build the REB 1100 (mono HalfVGA) .RB output
#
#$project->{BookFileName} = $ARGV[3];
#$project->{BookFileName} = $ARGV[0] . " - " . $ARGV[1];
#$project->Save($ARGV[3] . ".opf");
#$project->Save($ARGV[0] . " - " . $ARGV[1] . ".opf");
#
#$project->{BuildTarget} = 3;
#
# Now generate the .RB output
#$builder->Build($project);
#if (Win32::OLE->LastError() != 0) {
# print "ERROR: Build method failed for REB 1100.\n";
#} else {
# print "REB 1100 ebook created!\n";
#}

Win32::OLE->Uninitialize();

################################################## #################
#
# Event Handlers
#
package EventHandlers;

sub OnBuildStart()
{
my ($project, @args) = @_;
# print "Beginning validation...\n";
}

sub OnSourceStart()
{
my ($builder, $filename, @args) = @_;
# print "Parsing $filename...\n";
}

sub OnError()
{
# Get the arguments
my ($builder,
$filename,
$msg,
$line,
$col,
$severity,
@args) = @_;

my @severities = ("NOTE", "FATAL ERROR", "ERROR", "WARNING");

if ($filename =~ m/^.+[\\|\/](.+?)$/) { $filename = $1; }

# Print out the error message including any NOTE feedback.
# To ignore Warnings, change below to: if ($severity < 3)
if ($severity >= 0)
{
printf(" %-15s (L:%6d, C:%6d) %-7s:",
$filename,
$line,
$col,
$severities[$severity]);

print " $msg\n";
}
}

My perl script can be found in the Fictionwise eBookwise forum, under a posting entitled "Using perl scripts to produce .IMP ebooks and more... " (see link http://www.mobileread.com/forums/showthread.php?t=20050 ) - i hope this works

The actual perl script 'html2imp.pl' is an attachment located at http://www.mobileread.com/forums/attachment.php?attachmentid=9932&d=1202084852 . The .zip file in the first posting there has everything you need.

You must have installed the (free) eBook Publisher software from http://www.ebooktechnologies.com/support_publisher_download.htm for the interface calls to work. I do not check for that explicitly, but I could add that to the script if you think it will be useful.

I'm just learning to program in perl. Most of the code was adapted from example code in the documentation that comes with the eBook Publisher.

One word of caution, the eBook Publisher software used to produce .IMP ebooks is 'fussy' and doesn't support all html tags. My work-around includes filtering/changing those troublesome via hand-editing. I could try to incorporate all of this within 'html2imp.pl'.

I would really like it if you could incorporate this in your Mobiperl!!!

-Nick

JSWolf
02-10-2008, 08:40 AM
I will eventually but since there is a lot of trouble for me to produce the Windows version (I have to reboot my computer and loose all the Linux setup) I am not willing to do it too often. If you are using html2mobi and always have a cover image then there are no changes in version 0.0.28 that concerns you.

If somebody else would like to volonteer to produce Windows binaries please feel free to do so. My mobiperl web page give all the information needed to set it up and the Makefile in the distribution contains targets to build things in Windows using nmake.
Fixed the substituion of mbppagebreak that was done wrongly
That fix right there might very well concern me. I cannot say until I see the new version and how the HTML from it works with html2lrf and/or Book Designer.

nrapallo
02-10-2008, 08:50 AM
Yes, please provide Windows binaries at your earliest convenience.

I wouldn't want you to remove the <mbp: pagebreak> format-specific tag; just replace it with something more 'accepted'.

Your efforts are much appreciated!

JSWolf
02-10-2008, 08:56 AM
If somebody else would like to volonteer to produce Windows binaries please feel free to do so. My mobiperl web page give all the information needed to set it up and the Makefile in the distribution contains targets to build things in Windows using nmake.
If you can prove direct links to every module needed to compile the Windows version, I'll give it a go on Monday. I'm not about to go searching for modules that I am not sure are the correct ones. But with direct links I'll give it a try. The BMPe package is the one I am not going to try to find when I won't know if it's the right one or not. So please link to all the needed bits.

tompe
02-10-2008, 09:26 AM
That fix right there might very well concern me. I cannot say until I see the new version and how the HTML from it works with html2lrf and/or Book Designer.

That is only for mobi2html.

tompe
02-10-2008, 09:41 AM
If you can prove direct links to every module needed to compile the Windows version, I'll give it a go on Monday. I'm not about to go searching for modules that I am not sure are the correct ones. But with direct links I'll give it a try. The BMPe package is the one I am not going to try to find when I won't know if it's the right one or not. So please link to all the needed bits.

http://www.ida.liu.se/~tompe/mobiperl/

contains all the infoirmation. The modules are installed using the "Perl Package Manager" so you do not need any links. The list of modules are given on the webpage. Just enter the name in the package manager and search for them. You also have to remember to add an extra repository to the package manager.

You have to match the Perl version with the version of PAR that is working and I found a match by testing a lot of combinations. The important bit is: "I used ActivePerl 5.8.8 build 820 and for building the Windows binary I used PAR-Packer-588.ppd". "build 820" is important. The latest ActivePerl did not work.

PAR-Packer-588.pdd can be found at:

http://par.wikia.com/wiki/PAR_PPM_Compatibility_List

http://theoryx5.uwinnipeg.ca/ppms/PAR-Packer-588.ppd

I think I just did "ppm install PAR-Packer-588.ppd" to install it. Or you can probably use the graphical interface to the Package Manager on the ppd file.

As I said the Makefile in the distribution then shows how to build the binaries (nmake all; nmake pack).

Good luck.

DMcCunney
02-10-2008, 09:56 AM
Yes, I do need a new Windows binary. I do use html2mobi. So please come up with a new Windows edition for those of us using Windows. It isn't fair that Windows users get stuck with 2 versions behind.Why not install ActiveState Perl?

http://www.activestate.com/store/productdetail.aspx?prdGuid=81fbce82-6bd5-49bc-a915-08d58c2648ca

Free and open source.
______
Dennis

jswinden
02-10-2008, 04:25 PM
I do love to read, but I do NOT have the patience to read through hundreds of posts in this thread to find an answer. So parden me for coming to the end of the thread and posting my question and thoughts.

I installed ActiveStatePerl 5.8.8 and all the necessary (according to his website) addons to run mobi2html on my Win XP PC.

I used mobidedrm to de-DRM a MobiPocket file originally purchased for MobiPocket on the Palm.

When I go into a DOS Prompt and run mobihtml to convert this file I get a nice unpack folder filled with GIF files and one html file that contains only the following:

<html><head></head><body></body></html>

This is not much good as far as I can see if that is all it does. The image files are great, but no usable html is unpacked.

Has anyone ever gotten this script to work? If so, how...

wallcraft
02-10-2008, 04:56 PM
Has anyone ever gotten this script to work? If so, how... Unless you want to modify the scripts (or need the very latest versions), I suggest starting with the Windows binaries: mobiperl-0.0.26-win.zip (http://www.ida.liu.se/~tompe/mobiperl/downloads/mobiperl-0.0.26-win.zip)

However, the problem may be that your MOBI file is using MobiPocket compression (mobigen's -c2 option) which is not supported by mobi2html. I have seen empty HTML files from mobi2html on such files, and another way to confirm this would be if your DRM-free MOBI is readable by MobiPocket Reader but not readable by FBReader.

Most DRM-free MOBI files use simple (-c1) compression, but some use the -c2 option. I have no idea what fraction of DRM-laden MOBI files use the higher compression option.

tompe
02-10-2008, 05:27 PM
However, the problem may be that your MOBI file is using MobiPocket compression (mobigen's -c1 option) which is not supported by mobi2html. I have seen empty HTML files from mobi2html on such files, and another way to confirm this would be if your DRM-free MOBI is readable by MobiPocket Reader but not readable by FBReader.

Most DRM-free MOBI files use simple (-c0) compression, but some use the -c1 option. I have no idea what fraction of DRM-laden MOBI files use the higher compression option.

That is most probably the reason. This is starting to get annoying. I must remember to add some warning message about that in mobi2html.

So if somebody reverse engineered the compression format could that be something that MobiPocket could complain about? Could they see their secret compression algorithm as part of the DRM?

jswinden
02-10-2008, 05:29 PM
Unless you want to modify the scripts (or need the very latest versions), I suggest starting with the Windows binaries: mobiperl-0.0.26-win.zip (http://www.ida.liu.se/~tompe/mobiperl/downloads/mobiperl-0.0.26-win.zip)

However, the problem may be that your MOBI file is using MobiPocket compression (mobigen's -c1 option) which is not supported by mobi2html. I have seen empty HTML files from mobi2html on such files, and another way to confirm this would be if your DRM-free MOBI is readable by MobiPocket Reader but not readable by FBReader.

Most DRM-free MOBI files use simple (-c0) compression, but some use the -c1 option. I have no idea what fraction of DRM-laden MOBI files use the higher compression option.

As stated above, I already properly installed ActiveStatePerl and MobiPerl. I did get it to work with a DRM free file I got from MobileRead. But using it to convert a file that was de-DRMed using mobidedrm doesn't seem to work. One or both scripts come up short.

I do wish these guys would start creating executables. Having to install Python, Perl, NameYourStupidScriptingLanguage, etc., then having to attempt to figure out their poorly documented command line sytax is too complicated and time consuming for most people. And in the end, it always seems that the caveats make the whole process a huge waste of time!

wallcraft
02-10-2008, 05:47 PM
So if somebody reverse engineered the compression format could that be something that MobiPocket could complain about? Could they see their secret compression algorithm as part of the DRM? I don't see how it could be considered an effective DRM technique. Compression is also highly unlikely to be covered by patents, and in any case all that is required is decompression. This is somewhat analogous to the case with RAR, where there is an open source unrar utility but not one for creating .rar files.

DMcCunney
02-10-2008, 06:10 PM
I don't see how it could be considered an effective DRM technique. Compression is also highly unlikely to be covered by patents, and in any case all that is required is decompression. This is somewhat analogous to the case with RAR, where there is an open source unrar utility but not one for creating .rar files.RAR author Eugene Rorshal released public domain C code to open and extract RAR archives back in the MS-DOS days, so lots of things can open RAR files.

Mobi's high compression option Isn't intended to be DRM, per se. The purpose is for things like dictionaries. They need high compression to keep the file size within reason, but they can't use Zip because they need to be able to uncompress the file for display at any point in the file, and Zip starts at the beginning.

So they have a proprietary and undocumented compression method they developed to handle that requirement.

Since they made the Creator and Reader programs freeware, I think the next logical step for them would be to make them open source and get other developers involved, but I think the proprietary compression method would be an obstacle.
______
Dennis

DaleDe
02-10-2008, 07:31 PM
Yes, I do need a new Windows binary. I do use html2mobi. So please come up with a new Windows edition for those of us using Windows. It isn't fair that Windows users get stuck with 2 versions behind.

What is fair? If you install perl on windows they you won't need a binary :)

Dale

Gudy
02-11-2008, 03:39 AM
I don't see how it could be considered an effective DRM technique. Compression is also highly unlikely to be covered by patents[...]

Eh? Says who? There was a big noise a couple of years ago when some firm which had acquired the patent for the LZW(?) compression used in GIF image files started successfully enforcing said patent. If nothing else, it gave the hitherto lingering PNG format a nice boost.

Legally speaking, the Mobipocket compression scheme is probably not part of their DRM, so you likely don't have to fear the DMCA, but it may very well be covered by a patent. This patent in all likelihood would also cover a decompression routine for the compressed data.

HarryT
02-11-2008, 04:30 AM
I do wish these guys would start creating executables. Having to install Python, Perl, NameYourStupidScriptingLanguage, etc., then having to attempt to figure out their poorly documented command line sytax is too complicated and time consuming for most people. And in the end, it always seems that the caveats make the whole process a huge waste of time!

Don't you think that you're perhaps being a little unreasonable in your complaints? The people who write these tools are putting in a huge amount of time and effort, and releasing them free of charge. We should be thanking them for their hard work, not complaining. That's my view, anyway!

Jaapjan
02-11-2008, 05:39 AM
There is little you can do about these kind of complaints Harry. You'll always have people which are unwilling to put some of their own efforts into these tools to get it adapted or working for their situation. No matter if it is reading up to see if the tool does what they want, install a few support libraries or anything between those.

Sad fact of any developer, no matter on what program they're working.

DMcCunney
02-11-2008, 10:17 AM
Eh? Says who? There was a big noise a couple of years ago when some firm which had acquired the patent for the LZW(?) compression used in GIF image files started successfully enforcing said patent. If nothing else, it gave the hitherto lingering PNG format a nice boost.That was Unisys.

Lev and Zemple defined the original algorithm. Terry Welch described a simplified version that was easier to implement, becoming the W in LZW. At the time he wrote the paper describing the LZW algorithm, Terry worked for Sperry, now a unit of Unisys, and his contract gave his employer rights to his creations.

Unisys belatedly woke up, realized they had intellectual property rights to LZW compression, and started asking for money from sites with GIF images that used LZW compression.

As mentioned, it resulted in the PNG format, and it's a moot point now -- as far as I know, Unisys's rights have expired.

I wonder how much money they actually got from trying to enforce rights on LZW? I suspect not as much as they spent in legal fees doing it.

Legally speaking, the Mobipocket compression scheme is probably not part of their DRM, so you likely don't have to fear the DMCA, but it may very well be covered by a patent. This patent in all likelihood would also cover a decompression routine for the compressed data.If it is covered by patent, that will be discoverable. I doubt it, though. I suspect this is "trade secret" -- they just don't tell anyone what they did and how they did it.
______
Dennis

HarryT
02-11-2008, 10:23 AM
If it's huffman encoding, it's certainly not patented - that algorithm is in the public domain.

Gudy
02-11-2008, 10:28 AM
As mentioned, it resulted in the PNG format, and it's a moot point now -- as far as I know, Unisys's rights have expired.

Yup, since 2003 in the US and since 2004 in most of the rest of the world.

If it is covered by patent, that will be discoverable. I doubt it, though. I suspect this is "trade secret" -- they just don't tell anyone what they did and how they did it.

The latter would be ideal for someone trying to reverse-engineer the algorithm, say, through chosen plain-text attacks or something similar. And yes, if there is a patent, it should theoretically be discoverable.

Also, Harry, it may be Huffman, but their claims lead me to believe that they made changes to the compression algorithm so they can start decompressing anywhere they need so they can start presenting page 500 in a book without having to decode all previous pages. Those changes may very well be patented.

DMcCunney
02-11-2008, 11:45 AM
Also, Harry, it may be Huffman, but their claims lead me to believe that they made changes to the compression algorithm so they can start decompressing anywhere they need so they can start presenting page 500 in a book without having to decode all previous pages. Those changes may very well be patented.Correct. They've stated that they couldn't simply use something like the Zip "Deflate" algorithm because they wanted to be able to start decompressing at any specified point in the file (like the dictionary definition for a particular word), and Zip always started at the beginning.

I think it would be a smart move for them to make the Creator and Reader programs open source efforts, but that proprietary compression algorithm might be an issue in doing so. It seems like something they wouldn't want to release.
______
Dennis

JSWolf
02-11-2008, 12:08 PM
Ok, I installed it all and get all kinds of errors running MAKE. What do I do now?

nrapallo
02-11-2008, 02:02 PM
tompe:

I finished adding the ability to directly convert to .IMP formats from the .html converted by your 'mobi2html' perl script. I used your 'mobi2html' as a base (version 0.0.28) and created a new perl script named 'mobi2imp.pl'.

Therein, I indicate what changes were made so that you (or I :D) can update this module in future releases.

The 'mobi2imp.pl' is attached below and I also provide two sample conversions in the below .zip file for anyone who wants to test it out.

Feel free to include and/or modify this within your Mobiperl package!

You directly may not benefit (or even be able to convert to .IMP if not using Windows and the eBook Publisher software), but I believe this will allow those with many mobipocket .prc files to migrate to their ebookwise 1150 easily.

Is this OK with you?

EDIT 12 Feb 2008: version 2 - now 'Category Author Title' are optional and don't need to be provided. See mobi2IMP.bat and .zip file for details.

Regards,
-Nick

Gudy
02-11-2008, 02:21 PM
Ok, I installed it all and get all kinds of errors running MAKE. What do I do now?

I assume you have installed ActiveState's Perl and all the modules listed on the Mobiperl site. (You can install PAR-Packer-588 from the Perl Package Manager as well.), and also nmake from the MS KnowledgeBase article (http://support.microsoft.com/default.aspx?scid=kb;en-us;Q132084). (Copy NMAKE.ERR and NMAKE.EXE to C:\Perl\bin)

I also assume that you have extracted mobipocket-0.0.28.tar somewhere.

Go there and open the Makefile in your editor of choice. Use an editor that doesn't do anything to the file you don't tell it to do (like, say, converting tabs to spaces or "fixing" line lengths).

Delete all lines that start with "copy" and end with "c:\Perlb820\bin\"

Open a command prompt to that directory and type "nmake all"

Rejoice.

igorsk
02-11-2008, 03:12 PM
Correct. They've stated that they couldn't simply use something like the Zip "Deflate" algorithm because they wanted to be able to start decompressing at any specified point in the file (like the dictionary definition for a particular word), and Zip always started at the beginning.

What they did was just compress source HTML in chunks so that each chunk after compression fits inside a PDB record. They also added an index for mapping an uncompressed text position to record number so that only necessary records need to be decompressed. None of this actually requires Huffman, it can be done with deflate or lzw or lzma or whatnot.

JSWolf
02-11-2008, 03:55 PM
I assume you have installed ActiveState's Perl and all the modules listed on the Mobiperl site. (You can install PAR-Packer-588 from the Perl Package Manager as well.), and also nmake from the MS KnowledgeBase article (http://support.microsoft.com/default.aspx?scid=kb;en-us;Q132084). (Copy NMAKE.ERR and NMAKE.EXE to C:\Perl\bin)

I also assume that you have extracted mobipocket-0.0.28.tar somewhere.

Go there and open the Makefile in your editor of choice. Use an editor that doesn't do anything to the file you don't tell it to do (like, say, converting tabs to spaces or "fixing" line lengths).

Delete all lines that start with "copy" and end with "c:\Perlb820\bin\"

Open a command prompt to that directory and type "nmake all"

Rejoice.
What I get now is an unknown command "pp".

I did install everything according to the directions. But pp is not found.
What is pp and where do I get it?

Gudy
02-11-2008, 04:13 PM
pp is part of the Par-Packer-588 package, so it looks like this one didn't install right.

Start the Perl Package Manager.
Check that http://theoryx5.uwinnipeg.ca/ppms/package.lst has been added under Edit -> Preferences -> Repositories
There should be two entries under that tab. That one and the default entry from ActiveState.

Now, check that View -> All Packages is active, then type "PAR-Packer" without the quotes in the search box.

You should see two entries. PAR-Packer (0.976) and PAR-Packer-588 (0.973). The latter, and only the latter, should be installed in the "site" area. If not, right click that entry and install it.

(ETA: One thing that may not be readily apparent: Going right click -> Install on an entry does not actually install the package, but merely queues it for installation. You need to click the green right arrow button (center button on the right side of the search field) to actually execute all queued actions.)

Then try again.

JSWolf
02-11-2008, 05:15 PM
Now I get the following error...

Can't locate IO/Compress/Gzip.pm in @INC (@INC contains: C:/perl/site/lib C:/perl/lib .) at C:/perl/site/lib/Compress/Zlib.pm line 13.

Where do I get this?

Gudy
02-11-2008, 05:38 PM
Buh? I believe that something about your Perl install is well and truly b0rked.

I certainly don't have Zlib.pm at that location...

Since I have no clue what's happening here, I declare myself at the end of my wits. Until someone comes along who knows more, try the attached mobiperl build. *crosses fingers*

tompe
02-11-2008, 06:20 PM
Buh? I believe that something about your Perl install is well and truly b0rked.

I certainly don't have Zlib.pm at that location...


Thanks for the binaries. Hopefully they work...

Maybe I should remove the copy thing in the Makefile. It was a convenient way to ge the binaries in the path. is there a location that always exists that i can copy binaries to so it works for other people?

I have not seen the error message about missing files either. I guess that ithe problem is that Par-Packer-588 did not install properly.

Gudy
02-11-2008, 06:30 PM
Thanks for the binaries. Hopefully they work...

I only tried mobi2html.exe on one file downloaded from here (the Sherlock Holmes omnibus), and it seemed to work.

Maybe I should remove the copy thing in the Makefile. It was a convenient way to get the binaries in the path. is there a location that always exists that i can copy binaries to so it works for other people?

%SystemRoot%\system32;%SystemRoot%;%SystemRoot%\Sy stem32\Wbem

are always there, but I would not advise copying things there. %SystemRoot% is essentially C:\Windows on most systems. If you absolutely must copy things into the Path, make it a separate make target "install" depending on the "all" target, and copy things to C:\Perl\bin, the default location of ActiveState's Perl installation, which their installer also puts in the Path.

tompe
02-11-2008, 06:48 PM
If you absolutely must copy things into the Path, make it a separate make target "install" depending on the "all" target, and copy things to C:\Perl\bin, the default location of ActiveState's Perl installation, which their installer also puts in the Path.

I must be tired. Of course I can do a seperate target and I can run it manually when I need it. The Makefile is now changed so it works out of the box in the next release.

I also copied the binaries file to the web page.

tompe
02-11-2008, 06:51 PM
Now I get the following error...

Can't locate IO/Compress/Gzip.pm in @INC (@INC contains: C:/perl/site/lib C:/perl/lib .) at C:/perl/site/lib/Compress/Zlib.pm line 13.

Where do I get this?

If you show us all the output from when you run "nmake all" to the error message maybe somebody will have some idea about what can be wrong.

JSWolf
02-11-2008, 07:41 PM
Thanks for the binaries. Hopefully they work...

Maybe I should remove the copy thing in the Makefile. It was a convenient way to ge the binaries in the path. is there a location that always exists that i can copy binaries to so it works for other people?

I have not seen the error message about missing files either. I guess that ithe problem is that Par-Packer-588 did not install properly.

The problem is that Par-Packer is written for a different 5.8.8 build of ActivePerl. So the problem is that until things are sorted, it wont work.

JSWolf
02-11-2008, 07:43 PM
If you show us all the output from when you run "nmake all" to the error message maybe somebody will have some idea about what can be wrong.
I believe the rest of the errors have to do with the Gzlib.pm not being where it should be. So what I posted I believe is what the error is that needs to be fixed.

tompe
02-11-2008, 07:43 PM
The problem is that Par-Packer is written for a different 5.8.8 build of ActivePerl. So the problem is that until things are sorted, it wont work.

OK, I wrote previosly that you have to match versions carefully a thing I spent a lot of time finding out... It is also possible to install a new version of ActivePerl without uninstalling the old. The new one will get a new menu subentry and will be placed first in the path.

Gudy
02-12-2008, 03:03 AM
The problem is that Par-Packer is written for a different 5.8.8 build of ActivePerl. So the problem is that until things are sorted, it wont work.

If this is indeed the problem, it should be easily fixable: either you have the wrong Perl version or the wrong PAR-Packer version.

As for Perl: the most recent distribution (822) does not work, you need to download ActivePerl-5.8.8.820-MSWin32-x86-274739 (either the zip or the msi, I recommend the latter) from the archive of old versions (http://downloads.activestate.com/ActivePerl/Windows/5.8/).

After you have uninstalled the previous Perl version, install this one, then re-install all the packages listed on the Mobiperl site.

As for the PAR-Packer, as I wrote previously, there are two versions of that displayed in the Perl Package Manager when you add the UWinnipeg repository and then search for PAR-Packer. You need PAR-Packer-588 (version 0.973). Uninstall all other versions, then install this one.

mateo
02-12-2008, 10:49 AM
I searched and couldn't find mention of this, but on the Kindle, using books converted with html2mobi, I sometimes get an error "unexpected error" that takes me back to my homepage. It's a minor inconvenience, because I can then go back and open up the book and it will work. I've only noticed the problem when opening a book or when going to the cover. It doesn't happen just by turning the pages. Like I said, it's a minor inconvenience but I thought I'd just let you know about it anyways.

darkninja
02-12-2008, 12:30 PM
I wrote a decompressor, mobihuff.py, for the new huffdic compressed files. Maybe this code can be incorporated into mobiperl?

Note. This program does not break any DRM encryption, so it's not illegal. It just decompresses files compressed with the new compression into a raw html file.

Thanks to Igor Skochinsky for the valuable assistance.

http://pastebin.com/m2360435c

tompe
02-12-2008, 01:09 PM
I wrote a decompressor, mobihuff.py, for the new huffdic compressed files. Maybe this code can be incorporated into mobiperl?

Note. This program does not break any DRM encryption, so it's not illegal. It just decompresses files compressed with the new compression into a raw html file.


I am a bit worried about what status the code would have. Is translating to other languages enough to eliminate copyright concerns? Could I distribute the code I write under GPL3? To get a clean room implementaion the ideal thing would be for me to have a written description of the algorithm...

DMcCunney
02-12-2008, 01:12 PM
If you absolutely must copy things into the Path, make it a separate make target "install" depending on the "all" target, and copy things to C:\Perl\bin, the default location of ActiveState's Perl installation, which their installer also puts in the Path.Might want to check that exists, first. On my machine, the boot drive is normally D:, and Perl is in D:\Perl\Bin. (I dual booted Win2K and XP, and C: is the 2K drive.)
______
Dennis

JSWolf
02-12-2008, 01:19 PM
tompe did you know that mobi2html can strip the images out of a DRM Mobi eBook? I ran it on such by accident and I got an empty html file, but the images were fine.

tompe
02-12-2008, 01:21 PM
tompe did you know that mobi2html can strip the images out of a DRM Mobi eBook? I ran it on such by accident and I got an empty html file, but the images were fine.

Yes. The fixing of image size is based on this. Only the text is DRM:ed.

kovidgoyal
02-12-2008, 01:30 PM
I wrote a decompressor, mobihuff.py, for the new huffdic compressed files. Maybe this code can be incorporated into mobiperl?

Note. This program does not break any DRM encryption, so it's not illegal. It just decompresses files compressed with the new compression into a raw html file.

Thanks to Igor Skochinsky for the valuable assistance.

http://pastebin.com/m2360435c

Produces empty output for the attached mobi file. Generated using mobigen -c2 -s0 on linux.

Gudy
02-12-2008, 01:54 PM
Might want to check that exists, first. On my machine, the boot drive is normally D:, and Perl is in D:\Perl\Bin. (I dual booted Win2K and XP, and C: is the 2K drive.)

Right, so it should probably be %HOMEDRIVE%\Perl\bin then, or %SystemDrive%\Perl\bin?

kovidgoyal
02-14-2008, 11:24 AM
@tompe

version 0.29 of mobi2html still has a bunch of <mbp:pagebreak> elements in the output and doesn't create <a name> elements, when converting the attached mobi file (Created using mobigen -c1 -s0)

nrapallo
02-14-2008, 12:43 PM
kovidgoyal:

I noticed the same thing and have compensated for poorly constructed anchors (as used in SpaceEncyclopedia.mobi and a lot of feedbooks.com .prc's) in version 3 of my 'mobi2imp.pl' perl script.

The following code snippet (from mobi2html), now allows the hyperlinks to work properly (changes in bold):

print STDERR "Adding name attributes\n";
foreach my $pos (sort keys %fileposmap) {
# print STDERR "NAMEPOS: $pos\n";
my $a = substr ($text, $pos+$offset, 2);
if ($a eq "<a" or $a eq "<A") {
substr ($text, $pos+$offset, 2, "<a name=\"" . $pos . "\"");
$offset += (8 + length ($pos));
next;
}
if ($a eq "<h" or $a eq "<H") {
# Put an empty acnhor before any '<'
substr ($text, $pos+$offset, 2, "<a name=\"" . $pos . "\"></a><h");
$offset += (15 + length ($pos));
next;
}
#For .IMP start - Kludge mainly for feedbooks.com .prc files (ignore warning)
#
if (substr ($a, 0, 1) eq "<") {
# Put an empty acnhor before header
substr ($text, $pos+$offset, 2, "<a name=\"" . $pos . "\"></a>$a");
$offset += (15 + length ($pos));
print STDERR "FIXED: $pos - Not an anchor: $a\n";
next;
}
print STDERR "WARNING: $pos - Not an anchor: $a\n";
}

More information can be found in the Fictionwise eBookwise forum thread 'Using perl scripts to produce .IMP ebooks and more... ' at http://www.mobileread.com/forums/showpost.php?p=148648&postcount=3

-Nick

kovidgoyal
02-14-2008, 01:17 PM
@nrapallo
Thanks.

JSWolf
02-14-2008, 01:32 PM
nrapallo can you please attach a fixed version of mobi2html for us? Thanks!

nrapallo
02-14-2008, 01:42 PM
JSWolf:

If you mean the code where this fix is in, namely, 'mobi2imp.pl', then it can be found at http://www.mobileread.com/forums/showpost.php?p=148648&postcount=3 (1st attachment in post#3). I maintain this perl script.

If you mean, change 'mobi2html' to include this fix, then I would prefer to have tompe apply my suggested fix, as he is the maintainer of that script.

Either way, it would be helpful, to have this 'fix' incorporated.

-Nick

tompe
02-14-2008, 02:30 PM
@tompe

version 0.29 of mobi2html still has a bunch of <mbp:pagebreak> elements in the output and doesn't create <a name> elements, when converting the attached mobi file (Created using mobigen -c1 -s0)

For the attached file 0.0.29 did not give a result file with mbp:pagebreak in it. I have fixed the other problem. I wonder why the attached file hade two <mbp:pagebreak/> in a row. I did a fix that works for "<m" or "<M". I do not want to just put an anchor before "<" since I would like to detect the different ways this can be done since some of them might require special handling.

There is a 0.0.30 available and if I can get mobi2imp to compile under Windows there will be Windows binaries for this version available later today.

tompe
02-14-2008, 02:32 PM
I noticed the same thing and have compensated for poorly constructed anchors (as used in SpaceEncyclopedia.mobi and a lot of feedbooks.com .prc's) in version 3 of my 'mobi2imp.pl' perl script.


Please let me know if you see some more strange constructions for an anchor.

nrapallo
02-14-2008, 03:38 PM
Now that you mention it, the cases where I got the 'Not an anchor' message, that truly did not have any anchors, were:

1. <m - as you mentioned for <mp: pagebreak> (Why would you link to BEFORE a pagebreak? Great sample!).
2. <b - for '<b>' just before an anchor
3. <d - for '<div' like in many feedbooks.com .prc files that I converted

They all ended up being valid links to insert a '<a name' tag; just that the ACTUAL 'filepos' was poorly positioned. My thinking was that I would go ahead and insert it as long as it was not in the middle of something. The start of a tag "<" was indeed a good choice to overlook your warning.

If this turned out wrong, then the original .prc had it wrong too! Your code is right, just the anchors are poorly positioned!

Just my thoughts.

Also, thanks for including the 'mobi2imp' in version 0.0.29. That script is version 2; whereas, the fix mentioned above is in version 3 previously posted. However, version 3 also adds the ability to 'fix' corrupt images within the .prc as this is often the case when using the ebook Publisher software. You may want to just comment out this Windows-specific 'fix' line for your distribution, namely:

system "nconvert.exe", "-quiet", "-q", "85", "-resize", "100%", "100%", "$explodedir/$filename";

I wanted to update 'mobi2imp' to include your latest revisions to mobi2html for the 'huffdic' issue, but I couldn't find that fix in version 0.0.29; is it there?

-Nick

tompe
02-14-2008, 04:15 PM
Also, thanks for including the 'mobi2imp' in version 0.0.29. That script is version 2; whereas, the fix mentioned above is in version 3 previously posted. However, version 3 also adds the ability to 'fix' corrupt images within the .prc as this is often the case when using the ebook Publisher software. You may want to just comment out this Windows-specific 'fix' line for your distribution, namely:

system "nconvert.exe", "-quiet", "-q", "85", "-resize", "100%", "100%", "$explodedir/$filename";

I wanted to update 'mobi2imp' to include your latest revisions to mobi2html for the 'huffdic' issue, but I couldn't find that fix in version 0.0.29; is it there?


I changed my code to also insert an anchor before "<". I did not test it but hopefully it works...

I could not find nconvert.exe in my version...

The huffdic issue is much more work. And as i understood it it did not work entirely correct. I have not decided yet if I need a written description of the algorithm or if the Python code is enough to avoid contamination issues...