Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Formats > Kindle Formats

Notices

Reply
 
Thread Tools Search this Thread
Old 05-14-2014, 07:09 AM   #1
baf
Addict
baf ought to be getting tired of karma fortunes by now.baf ought to be getting tired of karma fortunes by now.baf ought to be getting tired of karma fortunes by now.baf ought to be getting tired of karma fortunes by now.baf ought to be getting tired of karma fortunes by now.baf ought to be getting tired of karma fortunes by now.baf ought to be getting tired of karma fortunes by now.baf ought to be getting tired of karma fortunes by now.baf ought to be getting tired of karma fortunes by now.baf ought to be getting tired of karma fortunes by now.baf ought to be getting tired of karma fortunes by now.
 
Posts: 328
Karma: 1346560
Join Date: May 2012
Device: kt
Libmobi – C library

Hi,

I just want to say that I started working on a C library for handling MOBI files.
It is in a very initial state. I just put my libmobi project to github. I am working on reading and parsing of mobi documents now. Next step would be writing mobi format files.
In the project there is a mobitool program which is meant to be an example of usage and a tool for testing the library. It is able to load and parse basic mobi files now. It partially recreates and dumps original markup.

All credit goes to users of this forum and authors of KindleUnpack and Calibre – the only source of information about MOBI format.

Anybody who would like to test or contribute to the project is welcome
baf is offline   Reply With Quote
Old 06-26-2014, 03:21 PM   #2
baf
Addict
baf ought to be getting tired of karma fortunes by now.baf ought to be getting tired of karma fortunes by now.baf ought to be getting tired of karma fortunes by now.baf ought to be getting tired of karma fortunes by now.baf ought to be getting tired of karma fortunes by now.baf ought to be getting tired of karma fortunes by now.baf ought to be getting tired of karma fortunes by now.baf ought to be getting tired of karma fortunes by now.baf ought to be getting tired of karma fortunes by now.baf ought to be getting tired of karma fortunes by now.baf ought to be getting tired of karma fortunes by now.
 
Posts: 328
Karma: 1346560
Join Date: May 2012
Device: kt
I want to report that I moved my project forward.
As I want to understand how mobi format works I focused on recreating html-like markup from mobi file.
I implemented reconstruction of internal references to html targets.
Apart from reading and parsing mobi documents, libmobi should now be able to produce a set of files that may be used as an input to kindlegen.

I am sure there are still bugs and need for improvements. I am short of time and mobi format samples, but I will be working on it.

For those who would like to test it, but don't know how to compile programs, I attach statically built binaries for several architectures. The program – mobitool is very simple. Its main function is turning mobi document into markup files. Precompiled binaries may or may not work for your particular system. The best and recommended way to test the library is to compile it from sources (from github). Mobitool is build together with the library. The process is quite straightforward.

Code:
# binary for kindle touch / paperwhite :)
mobitool-kindle:             ELF 32-bit LSB executable, ARM, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.31, stripped
# binary for linux 32-bit
mobitool-static-linux32:     ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.18, stripped
# binary for linux 64-bit
mobitool-static-linux64:     ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.32, stripped
# binary for os x 64-bit
mobitool-static-osx64:       Mach-O 64-bit executable x86_64
# binary for windows 32-bit
mobitool-static.exe:         PE32 executable for MS Windows (console) Intel 80386 32-bit

$ mobitool
usage: mobitool [-dmrsv7] filename
       without arguments prints document metadata and exits
       -d dump rawml text record
       -m print records metadata
       -r dump raw records
       -s dump recreated source files
       -v show version and exit
       -7 parse KF7 part of hybrid file (by default KF8 part is parsed)
Attached Files
File Type: zip mobitool-kindle.zip (30.4 KB, 20 views)
File Type: zip mobitool-static-linux32.zip (184.1 KB, 16 views)
File Type: zip mobitool-static-linux64.zip (223.4 KB, 19 views)
File Type: zip mobitool-static-osx64.zip (212.5 KB, 17 views)
File Type: zip mobitool-static.exe.zip (244.7 KB, 18 views)
baf is offline   Reply With Quote
Old 06-26-2014, 05:02 PM   #3
KevinH
Guru
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 953
Karma: 410248
Join Date: Nov 2009
Device: many
Hi,

I grabbed the source from github.

Nicely done!

I do not have libxml2.dylib on my machine so I configured without it and it built just fine on my Mac OS X Mavericks machine.

Interesting. I am glad to see that at least someone is making use of the KindleUnpack code and the code we contributed to calibre to support KF8 and joint mobis! ;-)

A few things:

1. when using the -s option you probably should remove the all the aid="blah" attributes on tags as they are all kindlegen generated and are not legal html5 and they will be added yet again if you pass that back through kindlegen.

(perhaps this would have happened if I had libxml2.dylib in /usr/local/lib/)?

2. you might want to grab the latest KindleUnpack testing version (v072f at the moment) and examine the code added to parse PAGE sections, and HD CONT sections and CRES sections as the first is used to create apnx files and the latter two are related to HD Images which are stored in CRES sections that come after the new CONT container boundary that comes after all of the kf8 sections.

It also has code to unpack to epub version 2 or epub version 3.

If you ever have questions about what the KindleUnpack code does, just let me know and I would be happy to help. Currently, I am working on unpacking .azk ebooks which are basically just a zip archive of json fragments and skeletons (so it is much like a KF8) but with its own form of dictionary based compression that whose keys are mapped into the Braille region of (x2800) unicode code points so that the jsonp objects are pure unicode strings.

Lots of fun with that one.

Take care,

KevinH

Last edited by KevinH; 06-26-2014 at 05:05 PM.
KevinH is offline   Reply With Quote
Old 06-26-2014, 06:01 PM   #4
DaleDe
Grand Sorcerer
DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.
 
DaleDe's Avatar
 
Posts: 9,653
Karma: 5072002
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2
Quote:
Originally Posted by baf View Post
Hi,



All credit goes to users of this forum and authors of KindleUnpack and Calibre – the only source of information about MOBI format.

Anybody who would like to test or contribute to the project is welcome
The main source for information about MOBI is the wiki. The KindleUnpack folks generally keep it up to date so you don't have to read source files (unless you like that sort of thing).

Dale
DaleDe is offline   Reply With Quote
Old 06-26-2014, 07:59 PM   #5
KevinH
Guru
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 953
Karma: 410248
Join Date: Nov 2009
Device: many
Hi,

Quote:
Originally Posted by DaleDe View Post
The main source for information about MOBI is the wiki. The KindleUnpack folks generally keep it up to date so you don't have to read source files (unless you like that sort of thing).

Dale
Actually, although that is very true for the older mobi format, the newer KF8 is a lot more complicated with lots of indices and tables and documenting it all would be quite a chore. The best documentation is probably by code example from calibre and KindleUnpack. And I must admit when I first reversed engineered the KF8 (mobi 8) format I did not take the time to update the Wiki with what I found, although I probably should have.

When I get more time I will try to add to the Wiki a lot more on the internal Mobi 8 format, the new skeleton and fragment indices, the FDST table, the guide index, the new fields in the ncx , and with how the raw ml must be split and recombined to create the source, and how internal links with base 32 numbers must be processed. I can also expand on the PAGE sections and how to convert them to APNX files and lots on the HD CONT section and high res images.

Most of that is only documented in the KindleUnpack and calibre code at the moment.

In fact, having a wiki just dedicated to the Mobi 8 format with actual C and python code samples that shows exactly how to parse things would be quite useful.
KevinH is offline   Reply With Quote
Old 06-27-2014, 08:05 AM   #6
baf
Addict
baf ought to be getting tired of karma fortunes by now.baf ought to be getting tired of karma fortunes by now.baf ought to be getting tired of karma fortunes by now.baf ought to be getting tired of karma fortunes by now.baf ought to be getting tired of karma fortunes by now.baf ought to be getting tired of karma fortunes by now.baf ought to be getting tired of karma fortunes by now.baf ought to be getting tired of karma fortunes by now.baf ought to be getting tired of karma fortunes by now.baf ought to be getting tired of karma fortunes by now.baf ought to be getting tired of karma fortunes by now.
 
Posts: 328
Karma: 1346560
Join Date: May 2012
Device: kt
Quote:
Originally Posted by KevinH View Post
Hi,

I grabbed the source from github.

Nicely done!

I do not have libxml2.dylib on my machine so I configured without it and it built just fine on my Mac OS X Mavericks machine.
Thanks for good word!
I thought libxml2 is part of xcode installation, but it is also possible that I built it myself. My Mavericks says:
Code:
$ xml2-config --prefix
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.9.sdk/usr
Quote:
Originally Posted by KevinH View Post
Interesting. I am glad to see that at least someone is making use of the KindleUnpack code and the code we contributed to calibre to support KF8 and joint mobis! ;-)
As I said earlier it is the only source of information on KF8 format. I followed MOBI wiki to some point, but most of the knowledge is only available between lines of python code (which btw I hardly understand ). Hats off to all the people that contributed to it.

Quote:
Originally Posted by KevinH View Post
A few things:

1. when using the -s option you probably should remove the all the aid="blah" attributes on tags as they are all kindlegen generated and are not legal html5 and they will be added yet again if you pass that back through kindlegen.

(perhaps this would have happened if I had libxml2.dylib in /usr/local/lib/)?
I realise that I should strip all Amazon formatting, but as it is the easiest step I put it off for later
If you had had libxml2 installed, mobitool would also recreate opf and ncx files.

Quote:
Originally Posted by KevinH View Post
2. you might want to grab the latest KindleUnpack testing version (v072f at the moment) and examine the code added to parse PAGE sections, and HD CONT sections and CRES sections as the first is used to create apnx files and the latter two are related to HD Images which are stored in CRES sections that come after the new CONT container boundary that comes after all of the kf8 sections.

It also has code to unpack to epub version 2 or epub version 3.

If you ever have questions about what the KindleUnpack code does, just let me know and I would be happy to help. Currently, I am working on unpacking .azk ebooks which are basically just a zip archive of json fragments and skeletons (so it is much like a KF8) but with its own form of dictionary based compression that whose keys are mapped into the Braille region of (x2800) unicode code points so that the jsonp objects are pure unicode strings.
Great! Where do I find the latest testing versions of KindleUnpack?
I found it. Is there any public repo for development, like on github?
Thanks!

Last edited by baf; 06-27-2014 at 08:12 AM.
baf is offline   Reply With Quote
Old 06-27-2014, 10:52 AM   #7
DaleDe
Grand Sorcerer
DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.
 
DaleDe's Avatar
 
Posts: 9,653
Karma: 5072002
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2
Quote:
Originally Posted by KevinH View Post
Hi,

In fact, having a wiki just dedicated to the Mobi 8 format with actual C and python code samples that shows exactly how to parse things would be quite useful.
Thanks for the correction. There is no problem creating more pages in the wiki although there is already a KF8 page there that could also be used.

Dale
DaleDe is offline   Reply With Quote
Old 06-27-2014, 10:11 PM   #8
eureka
but forgot what it's like
eureka ought to be getting tired of karma fortunes by now.eureka ought to be getting tired of karma fortunes by now.eureka ought to be getting tired of karma fortunes by now.eureka ought to be getting tired of karma fortunes by now.eureka ought to be getting tired of karma fortunes by now.eureka ought to be getting tired of karma fortunes by now.eureka ought to be getting tired of karma fortunes by now.eureka ought to be getting tired of karma fortunes by now.eureka ought to be getting tired of karma fortunes by now.eureka ought to be getting tired of karma fortunes by now.eureka ought to be getting tired of karma fortunes by now.
 
Posts: 728
Karma: 2314258
Join Date: Dec 2011
Location: north (by northwest)
Device: Kindle Touch
Quote:
Originally Posted by KevinH View Post
In fact, having a wiki just dedicated to the Mobi 8 format with actual C and python code samples that shows exactly how to parse things would be quite useful.
I want this.

@baf, can you lead or participate in updating technical documentation of MOBI format basing on your fresh knowledge?

I can help with that (though with slow pace) or can do it on my own. I can read Python and C, so knowledge hidden in KindleUnpack/libmobi code would be accessible to me. But undefined license of KindleUnpack and LGPL of libmobi are pretty restrictive for me, because I also have a desire to write another mobi library (someday), but with MIT license, so I'm trying to hold back from reading KindleUnpack and libmobi code.

@baf, can you relicense your library to MIT? @KevinH, can you (and all your contributors) explicitly set KindleUnpack license to MIT (or, better, CC0, given that you've expressed your neutrality to license choice sometime ago)?

I know, I'm asking too much, but I couldn't restrain myself from taking the opportunity.
EDIT: I see now that KindleUnpack is GPL3-licensed and libmobi is derived from it, so, I guess, no luck for me here.

Last edited by eureka; 06-27-2014 at 10:19 PM.
eureka is offline   Reply With Quote
Old 06-28-2014, 08:50 AM   #9
baf
Addict
baf ought to be getting tired of karma fortunes by now.baf ought to be getting tired of karma fortunes by now.baf ought to be getting tired of karma fortunes by now.baf ought to be getting tired of karma fortunes by now.baf ought to be getting tired of karma fortunes by now.baf ought to be getting tired of karma fortunes by now.baf ought to be getting tired of karma fortunes by now.baf ought to be getting tired of karma fortunes by now.baf ought to be getting tired of karma fortunes by now.baf ought to be getting tired of karma fortunes by now.baf ought to be getting tired of karma fortunes by now.
 
Posts: 328
Karma: 1346560
Join Date: May 2012
Device: kt
Quote:
Originally Posted by eureka View Post
@baf, can you lead or participate in updating technical documentation of MOBI format basing on your fresh knowledge?
I don't think any leadership is needed here. We could just start with updating MOBI wiki.
It is my duty to contribute to the wiki and I want to do it, but at the moment libmobi development itself consumes too much of my time. I hope I will be able to document some KF8 related algorithms when I reach more stable state of my project.
Quote:
I can read Python and C, so knowledge hidden in KindleUnpack/libmobi code would be accessible to me. But undefined license of KindleUnpack and LGPL of libmobi are pretty restrictive for me, because I also have a desire to write another mobi library (someday), but with MIT license, so I'm trying to hold back from reading KindleUnpack and libmobi code.
Choosing a license is a personal choice. I support GPL idea. Just notice that LGPL license is pretty permissive for a shared library. You can use it in commercial, closed-source applications. It still contains the dreadful virus though. So I well understand that somebody else may want to choose another approach.

You shouldn't be afraid of reading KindleUnpack or libmobi code. Is there much difference between learning an algorithm from a source code or from a documentation made of this source code? I still believe you know how to make fair use of it.
baf is offline   Reply With Quote
Old 06-28-2014, 09:41 AM   #10
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 9,269
Karma: 42123822
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
I find the act of asking someone to change their personal decision (RE software licensing), to match your personal needs, to be more vulgar than door-to-door religious proselyzing.

If someone ASKS for help choosing a license ... sure, expound away. But if the decision is already made ... assume some thought went into it, and that license that most closely matched the creator's personal convictions was selected.
DiapDealer is online now   Reply With Quote
Old 06-28-2014, 11:23 AM   #11
KevinH
Guru
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 953
Karma: 410248
Join Date: Nov 2009
Device: many
Hi,
In general, I am a bit of a license agnostic, especially when it comes to creating tools for authors and other non-developer users who just want to read and develop their own books.

That said ... I wouldn't be happy if I gave away the literally 100s of hours reverse engineering the mobi 8 code just so that someone else could make money on it without contributing back their code and improvements and without attribution. That was the whole point of the original GPL and although I think the GPL version 3 is doing more harm than good by driving commercial companies away from open source projects like gcc, the original GPL 1 and 2 licenses were quite good licenses and served a useful purpose.

And none of my reverse engineering would have been worth a hill of beans without knowledge of the older mobi format and the original authors of mobiunpack, the huff dic compression code, everyone who contributed to the wiki, and especially the insanely messy code of reading and decoding the indexes with their arcane variable length bytes and rules. So if they want GPL 3 vs 2, who am I to say otherwise. Their choice of license has worked splendidly, as far as I can tell. So I think there is 0 chance of the license of KindleUnpack changing.

My 2 cents,

KevinH

Last edited by KevinH; 06-28-2014 at 11:28 AM.
KevinH is offline   Reply With Quote
Old 06-28-2014, 07:25 PM   #12
eureka
but forgot what it's like
eureka ought to be getting tired of karma fortunes by now.eureka ought to be getting tired of karma fortunes by now.eureka ought to be getting tired of karma fortunes by now.eureka ought to be getting tired of karma fortunes by now.eureka ought to be getting tired of karma fortunes by now.eureka ought to be getting tired of karma fortunes by now.eureka ought to be getting tired of karma fortunes by now.eureka ought to be getting tired of karma fortunes by now.eureka ought to be getting tired of karma fortunes by now.eureka ought to be getting tired of karma fortunes by now.eureka ought to be getting tired of karma fortunes by now.
 
Posts: 728
Karma: 2314258
Join Date: Dec 2011
Location: north (by northwest)
Device: Kindle Touch
Thanks for detailed answers.

@baf, @KevinH, I fully appreciate your work and understand your reasons and exclusive right for choosing a license for your code. I endorse (L)GPL in general, but in case of unique closed-source libraries (I mean Amazon's mobi library used in kindlegen and on Kindle, not baf's libmobi or KindleUnpack) it would be nice to have [also] MIT/BSD-licensed library code as an implementation example compatible with most of possible licenses for further (re)implementations (as opposed to end-user programs, where differences between GPL and MIT/BSD aren't important in this sense). For me, this case is about disseminating knowledge in source code form and not necessarily about freeing software.

Anyway, I didn't want to start flamewar about licensing and didn't mean to thrust my opinion on it. Sorry for raising this theme.

As a sidenote: I don't want to read and reimplement (L)GPL-licensed mobi-related code mostly because of moral considerations, not legal. I don't think somebody will really sue me, if I'll read your code, get knowledge about algorithms and data structures, implement them and share result under MIT license. It's just that such reimplementation not in the spirit of principles behind GPL and "free software". Also I didn't even start to write my code, so following this reason is easy and rational ATM

UPD: to be fully clear, (L)GPL of existing source in this concrete case is acceptable for me, and I don't want to persuade you to change it. I respect your personal choices.

Last edited by eureka; 06-28-2014 at 09:32 PM. Reason: clarifications in sidenote
eureka is offline   Reply With Quote
Reply

Tags
libmobi

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Merge Formats library into Metadata library? Or, Add Format in bulk? Sabardeyn Library Management 5 01-23-2013 06:00 AM
copy #1 library to #2 library, cover & .epub file lost yujunglin Library Management 3 10-15-2011 02:13 AM
[Old Thread] import library or export to single file add to existing library PCreighton Calibre 4 04-10-2011 01:08 AM
Sony Reader Library running, but Library doesn't show on screen wyldmint Sony Reader 0 08-29-2010 01:59 AM
How to move public library book from ADE to Sony Library? mom2three Sony Reader 3 06-30-2010 05:26 AM


All times are GMT -4. The time now is 02:14 PM.


MobileRead.com is a privately owned, operated and funded community.