View Full Version : Deleting AUTHOR fields with mobi2mobi


davey
12-06-2008, 11:44 PM
I have a number of mobi and azw e-books from MobileReference, and they're not very consistent in their headers. In some, they've seen fit to label themselves as not only the Publisher but also as an Author. So these files have two Author headers, one for the actual author and one for MobileReference. On the Kindle, the author shows up as "MobileReference and <whoever>, with the <whoever> getting truncated by the display.

I'm trying to use mobi2mobi.exe from the command prompt to clean this up, using "--delexthtype Author", intending to then add back the real author. But the deletion step doesn't seem to work.

The output from the program includes "Deleting extended header data of type: Author", so it's parsing my command line input properly. But the resulting file still has both of the two Author headers.

Has anyone tried this? Or is there some other way to do this?

daffy4u
12-07-2008, 12:21 AM
I have a number of mobi and azw e-books from MobileReference, and they're not very consistent in their headers. In some, they've seen fit to label themselves as not only the Publisher but also as an Author. So these files have two Author headers, one for the actual author and one for MobileReference. On the Kindle, the author shows up as "MobileReference and <whoever>, with the <whoever> getting truncated by the display.

I'm trying to use mobi2mobi.exe from the command prompt to clean this up, using "--delexthtype Author", intending to then add back the real author. But the deletion step doesn't seem to work.

The output from the program includes "Deleting extended header data of type: Author", so it's parsing my command line input properly. But the resulting file still has both of the two Author headers.

Has anyone tried this? Or is there some other way to do this?

The Visual Kindle Guide (http://wiki.mobileread.com/wiki/Visual_Kindle_Guide) has instructions on how to use Mobi2Mobi from the command line or with the GUI. I'm a GUI girl myself. You should be able to use it to change the author very easily.

davey
12-07-2008, 01:14 AM
The Visual Kindle Guide (http://wiki.mobileread.com/wiki/Visual_Kindle_Guide) has instructions on how to use Mobi2Mobi from the command line or with the GUI. I'm a GUI girl myself. You should be able to use it to change the author very easily.

Thanks for the reply. This seems to work fine when there's only a single Author header. When there are two, it looks like it replaces the first one, but you're still left with two authors.

I haven't been able to get the GUI version operational, but it looks like it also expects a single Author header.

I can see the two headers in a binary file editor, but simply deleting the bytes for the extra one doesn't work as it apparently screws up the length of the database.

daffy4u
12-07-2008, 01:37 AM
Thanks for the reply. This seems to work fine when there's only a single Author header. When there are two, it looks like it replaces the first one, but you're still left with two authors.

I haven't been able to get the GUI version operational, but it looks like it also expects a single Author header.

I can see the two headers in a binary file editor, but simply deleting the bytes for the extra one doesn't work as it apparently screws up the length of the database.

I have books with multiple authors (a lot from Baen), so I'm not sure about the problem you're having.

Would you mind sharing the title of the book. I can download the sample and try to see what the issue is.

pdurrant
12-07-2008, 09:24 AM
Have you tried taking the output file and running it through mobi2mobi with the --delexthtype Author option?

If there are two Author EXTH entries, it might be that the first pass will delete one, and the next pass will delete the other.

I have a number of mobi and azw e-books from MobileReference, and they're not very consistent in their headers. In some, they've seen fit to label themselves as not only the Publisher but also as an Author. So these files have two Author headers, one for the actual author and one for MobileReference. On the Kindle, the author shows up as "MobileReference and <whoever>, with the <whoever> getting truncated by the display.

I'm trying to use mobi2mobi.exe from the command prompt to clean this up, using "--delexthtype Author", intending to then add back the real author. But the deletion step doesn't seem to work.

The output from the program includes "Deleting extended header data of type: Author", so it's parsing my command line input properly. But the resulting file still has both of the two Author headers.

Has anyone tried this? Or is there some other way to do this?

tompe
12-07-2008, 10:35 AM
Have you tried taking the output file and running it through mobi2mobi with the --delexthtype Author option?

If there are two Author EXTH entries, it might be that the first pass will delete one, and the next pass will delete the other.

That might work. I have to admit that when writing mobi2mobi I did not think about the possibility to have more than one author field. But I read the code now and it seems that it could work removing all author fields with just one call. But I do not think I have tested this...

davey
12-07-2008, 10:51 AM
Have you tried taking the output file and running it through mobi2mobi with the --delexthtype Author option?

If there are two Author EXTH entries, it might be that the first pass will delete one, and the next pass will delete the other.

Thanks, that's close to the solution. It looks like the first --delexthtype pass results in a file that still has two Author headers ... one is blank, while the other is unchanged. So it really didn't delete a header; it just nulled one out.

A second pass on the output from the first pass results in a file with a single Author header, which is blank. So this actually deleted one, nulled the other.

So then you have to make a third pass with --exthtype and with --exthdata to restore the proper Author.

:smack:

tompe
12-07-2008, 12:35 PM
Thanks, that's close to the solution. It looks like the first --delexthtype pass results in a file that still has two Author headers ... one is blank, while the other is unchanged. So it really didn't delete a header; it just nulled one out.

A second pass on the output from the first pass results in a file with a single Author header, which is blank. So this actually deleted one, nulled the other.

So then you have to make a third pass with --exthtype and with --exthdata to restore the proper Author.

:smack:

This was annoying. I will fix this buggy behaviour. I have just added a flag to add an author so I can test deleting authors...

tompe
12-07-2008, 01:03 PM
Just clarify that the argument to "--delexthtype" should be "author" and not "Author". But I have now rewritten the code so it works for me and the code is nicer. I have also added a flag "--addexthtype" in addition to "--addauthor".

davey
12-07-2008, 01:20 PM
This was annoying. I will fix this buggy behaviour. I have just added a flag to add an author so I can test deleting authors...

It was a bit bloody, but if there are multiple Author headers, how do you know which one to delete with --delexthtype? There's probably no way to do this cleanly.

It looks like it operates on the first Author header, but it doesn't delete the header, but rather replaces the data with a null string. It might be clearer if it did delete it.

But if it's the second Author header that you're trying to get rid of, then you likely have to delete both of them and then re-add the first one.

Perhaps you could use --delexthtype Author --exthdata "BadAuthor" that would tell it to delete only a header with that particular data.

MobileReference seems to have figured out the error of their ways, as their most recent e-books have only contained a single author header. But older ones have included the editor(s), the translator(s), "MobileReference", and "mobi" as Authors. One had 5 author headers!

davey
12-07-2008, 01:32 PM
Just clarify that the argument to "--delexthtype" should be "author" and not "Author". But I have now rewritten the code so it works for me and the code is nicer. I have also added a flag "--addexthtype" in addition to "--addauthor".

Thanks. I think that's why my first attempts seemed to do nothing at all. When I came back to it today, I used all lowercase by chance and started making some progress.

Is the binary Mobipocket format documented anywhere, or did you just reverse engineer it, so to speak?

tompe
12-07-2008, 01:39 PM
It was a bit bloody, but if there are multiple Author headers, how do you know which one to delete with --delexthtype? There's probably no way to do this cleanly.


Now I delete all of them. Maybe I should be proactive here and add a flag to specify which one to delete before somebody ask for this functionality...

tompe
12-07-2008, 01:42 PM
Thanks. I think that's why my first attempts seemed to do nothing at all. When I came back to it today, I used all lowercase by chance and started making some progress.

In retrospect it was a pretty stupid idea to use capitalization in the output but not in the input. I hate capitalization in inputs so I should have used only lower case in the output also.


Is the binary Mobipocket format documented anywhere, or did you just reverse engineer it, so to speak?

I reverse engineered it with some input from igorsk. The documentation is not available so there are guessed made about the format so you cannot trust the tools 100%.

pdurrant
12-17-2008, 05:36 AM
All the information that I know about the Mobipocket format has been put into the wiki at

http://wiki.mobileread.com/wiki/MOBI


I reverse engineered it with some input from igorsk. The documentation is not available so there are guessed made about the format so you cannot trust the tools 100%.