Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Formats > Kindle Formats

Notices

Reply
 
Thread Tools Search this Thread
Old 02-15-2012, 12:37 PM   #1
Limey
Junior Member
Limey plays well with othersLimey plays well with othersLimey plays well with othersLimey plays well with othersLimey plays well with othersLimey plays well with othersLimey plays well with othersLimey plays well with othersLimey plays well with othersLimey plays well with othersLimey plays well with others
 
Posts: 9
Karma: 2831
Join Date: Dec 2011
Device: Kindle
Programmatically reading mobi EXTH header

Hi all,

I have been writing a program in C# (.Net 4.0) to help my dad reorganise his ebook collection (almost entirely MOBI format) by renaming all the files into a tidier format based on the authorname of the folder they're in.

If possible I'd like to extend this to programmatically get the author name and title etc from the metadata within the MOBI file itself. I'd like to try to do this from the standard MOBI metadata since not all of these ebooks will necessarily have been generated by or processed using Calibre.


From what I've read so far reading EXTH header information can be tricky because it can be compressed, some of it using Mobi's own secret compression scheme.

Am really just starting out on this, so was wondering if anyone had any information on programmatically reading EXTH header information and whether it's necessary to first get a routine to decompress the file? Looking at the wikipedia entry for mobi file format and EXTH header, I think I can probably easily read in the information I want once I can get at its XML format rather than the compressed version that seems to be in the MOBI files I have.

I don't want to write or update anything within the file, just read the metadata.

Thanks in advance if anyone can point me in the right direction!
Limey is offline   Reply With Quote
Old 02-15-2012, 12:54 PM   #2
susan_cassidy
Wizard
susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.
 
Posts: 2,251
Karma: 3720310
Join Date: Jan 2009
Location: USA
Device: Kindle, iPad (not used much for reading)
Look at the code for the mobiperl programs. It tells you how to read the EXTH stuff.
susan_cassidy is offline   Reply With Quote
Advert
Old 02-15-2012, 01:07 PM   #3
Mike L
Wizard
Mike L ought to be getting tired of karma fortunes by now.Mike L ought to be getting tired of karma fortunes by now.Mike L ought to be getting tired of karma fortunes by now.Mike L ought to be getting tired of karma fortunes by now.Mike L ought to be getting tired of karma fortunes by now.Mike L ought to be getting tired of karma fortunes by now.Mike L ought to be getting tired of karma fortunes by now.Mike L ought to be getting tired of karma fortunes by now.Mike L ought to be getting tired of karma fortunes by now.Mike L ought to be getting tired of karma fortunes by now.Mike L ought to be getting tired of karma fortunes by now.
 
Mike L's Avatar
 
Posts: 1,479
Karma: 3846231
Join Date: Apr 2009
Location: Edinburgh, Scotland
Device: Kindle 3, Samsung Galaxy
There's a Wiki article here on MobileRead that might help.

I had a shot as doing something similar last year (just for fun). I don't recall any problems with compression of the meta data (as opposed to the actual text of the book). But I never got round to testing it with a large sample of books.
Mike L is offline   Reply With Quote
Old 02-15-2012, 01:30 PM   #4
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,463
Karma: 192992430
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
It's in python, but the mobi_unpack program has a wealth of code examples for finding and parsing the EXTH section of a mobi. It can be found five threads down,

The very latest code is in this post

There should be no compression on the EXTH section to worry about.
DiapDealer is offline   Reply With Quote
Old 02-15-2012, 01:52 PM   #5
Limey
Junior Member
Limey plays well with othersLimey plays well with othersLimey plays well with othersLimey plays well with othersLimey plays well with othersLimey plays well with othersLimey plays well with othersLimey plays well with othersLimey plays well with othersLimey plays well with othersLimey plays well with others
 
Posts: 9
Karma: 2831
Join Date: Dec 2011
Device: Kindle
Ok thanks for all the suggestions I will take a look tonight!

Don't know anything about Python so will have a nosey through and try to translate how I can achieve a similar thing in C# (although I imagine once I've streamed the text of the file in the code will be fairly similar).

Last edited by Limey; 02-15-2012 at 01:56 PM.
Limey is offline   Reply With Quote
Advert
Old 02-16-2012, 02:07 AM   #6
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,582
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
Java Mobi Metadata Editor & Alissa's MobiHandler

You might also find the (non-obfuscated) Java code of gluggy's Java Mobi Metadata Editor useful, which you can view with JD-GUI or request from the developer.
There's also Alissa's MobiHandler, whose source code is included in the release.
Doitsu is offline   Reply With Quote
Old 02-16-2012, 07:44 PM   #7
Limey
Junior Member
Limey plays well with othersLimey plays well with othersLimey plays well with othersLimey plays well with othersLimey plays well with othersLimey plays well with othersLimey plays well with othersLimey plays well with othersLimey plays well with othersLimey plays well with othersLimey plays well with others
 
Posts: 9
Karma: 2831
Join Date: Dec 2011
Device: Kindle
Quote:
Originally Posted by Doitsu View Post
You might also find the (non-obfuscated) Java code of gluggy's Java Mobi Metadata Editor useful, which you can view with JD-GUI or request from the developer.
There's also Alissa's MobiHandler, whose source code is included in the release.
OK thank you Going to have a stab at it tonight so will take a look.
Limey is offline   Reply With Quote
Old 02-18-2012, 11:07 AM   #8
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,506
Karma: 5433350
Join Date: Nov 2009
Device: many
DumpMobiHeader.py Python 2.7 command line program

Hi,

Since there are many different tools to manipulate Mobi Headers, I have put together a python 2.7 program that will work with Amazon/Mobi ebooks created with the very latest version of Kindlegen.

This program will dump all known and unknown fields and all EXTH metadata in each mobi header that is found in the ebook. This includes the latest KF8 dual mobi style books that Kindlegen now generates which have two separate headers and two EXTH metadata storage areas.


To run the program simply do the following:

python ./DumpMobiHeader.py PATH_TO_YOUR_EBOOK

on Mac or Linux

or

python .\DumpMobiHeader.py PATH_TO_YOUR_EBOOK

when running cmd.exe under Windows.

It should work on both drm and non-drm Amazon/Mobi style ebooks that use the latest header layout since the headers and metadata are not excrypted themselves). Please note that Amazon ties its DRM to many of the metadata fields (watermark, tts, etc) to prevent them from being changed. Also some new metadata values are required for the ebook to be read properly. So be careful exactly what metadata values you change or delete. You may end up breaking the ebook.

I wrote this to document all that is known about based on other tools, the wiki about our Mobi format, and from reversing the latest KF8 format mobis for the Mobi_Unpack program.

Even if you do not read/follow Python, the code itself documents what is known and should be easy enough to follow along.

If anyone knows of *any* corrections or extensions please let me know so we can keep this program updated to help properly document the mobi format.

Hope this helps,

KevinH
Attached Files
File Type: zip DumpMobiHeader.py.zip (4.1 KB, 956 views)

Last edited by KevinH; 02-28-2012 at 09:16 AM. Reason: updated version to work with older ebooks too
KevinH is offline   Reply With Quote
Old 02-18-2012, 01:15 PM   #9
Limey
Junior Member
Limey plays well with othersLimey plays well with othersLimey plays well with othersLimey plays well with othersLimey plays well with othersLimey plays well with othersLimey plays well with othersLimey plays well with othersLimey plays well with othersLimey plays well with othersLimey plays well with others
 
Posts: 9
Karma: 2831
Join Date: Dec 2011
Device: Kindle
Hi cool thanks. Please note that my intention is not to change ANY of the metadata, or to change the MOBI file at all. Merely to be able to read the title and author of the book from the metadata so I can programmatically rename the file.

Basically I have written a program for my dad that will tidy up and rename the ebooks based on the author folder they're in. But he has many ebooks which are in a miscellaneous folder with just 1 book per author, and not worth the time to manually put these into separate author folder. Since there's no reliable way to get the program to guess the author name and title based on the filename, programmatically examining the MOBI header seemed a good way to handle it.

So far I've just got to the stage where I can stream the array of bytes into my C# program, reading in the first X bytes (usually the header looks to be just over 1024 bytes so I am taking in 2048 to be on the safe side), since obviously we don't need the whole damn book, just the header!

Of course now the challenge is to parse these bytes and establish whereabouts the relevant author name and title can be found. I currently program in Delphi at work but formerly had many years experience with C# and VB, but never really used C++ or Python.

Anyway I will take a look through your code hopefully it will be a helpful documentation of how to parse the codes.

If I get anything up and running I`ll try to make the C# source available in case any other .Net programmers out there are interested.

Many thanks.
Limey is offline   Reply With Quote
Old 02-20-2012, 07:41 AM   #10
Mike L
Wizard
Mike L ought to be getting tired of karma fortunes by now.Mike L ought to be getting tired of karma fortunes by now.Mike L ought to be getting tired of karma fortunes by now.Mike L ought to be getting tired of karma fortunes by now.Mike L ought to be getting tired of karma fortunes by now.Mike L ought to be getting tired of karma fortunes by now.Mike L ought to be getting tired of karma fortunes by now.Mike L ought to be getting tired of karma fortunes by now.Mike L ought to be getting tired of karma fortunes by now.Mike L ought to be getting tired of karma fortunes by now.Mike L ought to be getting tired of karma fortunes by now.
 
Mike L's Avatar
 
Posts: 1,479
Karma: 3846231
Join Date: Apr 2009
Location: Edinburgh, Scotland
Device: Kindle 3, Samsung Galaxy
Quote:
Originally Posted by Limey View Post
If I get anything up and running I`ll try to make the C# source available in case any other .Net programmers out there are interested.
Please do. I for one would find it very useful (especially as I can understand C# better than Python).
Mike L is offline   Reply With Quote
Old 07-17-2012, 02:18 PM   #11
Limey
Junior Member
Limey plays well with othersLimey plays well with othersLimey plays well with othersLimey plays well with othersLimey plays well with othersLimey plays well with othersLimey plays well with othersLimey plays well with othersLimey plays well with othersLimey plays well with othersLimey plays well with others
 
Posts: 9
Karma: 2831
Join Date: Dec 2011
Device: Kindle
Quote:
Originally Posted by Mike L View Post
Please do. I for one would find it very useful (especially as I can understand C# better than Python).
Hi Mike.

Sorry for the delay I've been kinda busy for several months and just got round to popping back on this forum.

Just to let you know I have something up and running now which will fully parse a .mobi file, PDBHeader, PalmDOCHeader and all of the Mobi header including the extended metadata, written entirely in C# (for .Net 4.0) without relying on MobiPerl or any other external modules.

The core classes also come with supporting features such as SortedDictionary properties on each header to easily set the contents of a header as a datasource on a grid etc, and overridden ToString() methods to pump out the properties and values in a plain text format.

The interface allows for any combination of mobi properties to be selected as columns so you can view all metadata for all books in a folder at the same time in a listview, Windows Explorer style, as well as the full metadata on individual books from a right-click popup dialog.

I actually wrote this a while ago but have recently moved house and with a number of other things on just haven't had time to pop on here.

Going to tidy things up a bit when I get home tonight and then I'll post the source code and description.

Wayne.
Limey is offline   Reply With Quote
Old 07-18-2012, 10:14 AM   #12
Mike L
Wizard
Mike L ought to be getting tired of karma fortunes by now.Mike L ought to be getting tired of karma fortunes by now.Mike L ought to be getting tired of karma fortunes by now.Mike L ought to be getting tired of karma fortunes by now.Mike L ought to be getting tired of karma fortunes by now.Mike L ought to be getting tired of karma fortunes by now.Mike L ought to be getting tired of karma fortunes by now.Mike L ought to be getting tired of karma fortunes by now.Mike L ought to be getting tired of karma fortunes by now.Mike L ought to be getting tired of karma fortunes by now.Mike L ought to be getting tired of karma fortunes by now.
 
Mike L's Avatar
 
Posts: 1,479
Karma: 3846231
Join Date: Apr 2009
Location: Edinburgh, Scotland
Device: Kindle 3, Samsung Galaxy
Sounds good, Wayne. I'll look forward to seeing it, and I'm sure others here will too.

Mike
Mike L is offline   Reply With Quote
Old 07-19-2012, 10:42 PM   #13
Limey
Junior Member
Limey plays well with othersLimey plays well with othersLimey plays well with othersLimey plays well with othersLimey plays well with othersLimey plays well with othersLimey plays well with othersLimey plays well with othersLimey plays well with othersLimey plays well with othersLimey plays well with others
 
Posts: 9
Karma: 2831
Join Date: Dec 2011
Device: Kindle
Hi - ok I've almost finished. I've ripped this out of the larger program I had written into, and it is now is its own stand alone Windows application.

I just have a few creases to iron out and some testing to do, so hopefully will be posting it by the weekend.

Wayne.
Limey is offline   Reply With Quote
Old 07-25-2012, 06:48 PM   #14
Limey
Junior Member
Limey plays well with othersLimey plays well with othersLimey plays well with othersLimey plays well with othersLimey plays well with othersLimey plays well with othersLimey plays well with othersLimey plays well with othersLimey plays well with othersLimey plays well with othersLimey plays well with others
 
Posts: 9
Karma: 2831
Join Date: Dec 2011
Device: Kindle
Hi Mike,

OK it's posted now:
https://www.mobileread.com/forums/sho...d.php?t=185565

Please let me know if you have any comments.

Cheers,
Wayne.
Limey is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Update Mobi header/file metadata without doing a Mobi to Mobi conversion RecQuery Conversion 2 06-30-2012 11:43 AM
new exthupdt.py to update title in EXTH header to the filename dilo_sec Kindle Formats 3 07-24-2011 05:14 PM
EXTH 116 when converting to MOBI DiapDealer Conversion 4 07-23-2011 12:14 PM
EXTH Header Question RecQuery Kindle Formats 1 06-07-2011 02:28 PM
exth type, exth data AlexBell Reading and Management 4 07-06-2008 11:51 PM


All times are GMT -4. The time now is 03:58 AM.


MobileRead.com is a privately owned, operated and funded community.