Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Formats > Kindle Formats

Notices

Reply
 
Thread Tools Search this Thread
Old 09-01-2010, 01:01 PM   #1
pdurrant
The Grand Mouse
pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.
 
pdurrant's Avatar
 
Posts: 29,331
Karma: 83792800
Join Date: Jul 2007
Location: Norfolk, England
Device: NOOK ST GlowLight
Kindlestrip Python script and AppleScript wrapper

Kindlegen 1.1 and later now always adds the source files used in compiling the kindle ebook as one of the (invisible) records in the kindle ebook.

So I wrote a python script that strips out the sources record from Kindle format ebooks. And for those on Macs I wrote a nice Applescript wrapper and also put the python script in the AppleScript bundle to make things easy.

Kevin Hendricks has since updated the code to handle files from KindleGen 2.x, and I've also tweaked a bit more to handle KindleGen 2.7.

If you're going to upload to the Amazon store, this script is definitely unnecessary, as Amazon will strip the sources before delivery anyway.

Do not use this script to make files to be uploaded to KDP.

But if you're going to upload the ebook somewhere else, (e.g. the MobileRead Library), you might well want to make the file as small as possible.

If you're on a Mac you only need the Applescript, as it includes the Python script in it. The Applescript is a simple drag&drop operation — drag your KindleGen generated file onto it, and it creates one named [oldname]_stripped.mobi.

As always, please comment with any bug reports or problems.
Attached Files
File Type: zip KindleStrip 1.35.app.zip (33.9 KB, 2132 views)
File Type: zip kindlestrip.py 1.35.zip (4.0 KB, 3588 views)

Last edited by pdurrant; 10-27-2012 at 06:47 AM.
pdurrant is online now   Reply With Quote
Old 09-03-2010, 05:23 PM   #2
pdurrant
The Grand Mouse
pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.
 
pdurrant's Avatar
 
Posts: 29,331
Karma: 83792800
Join Date: Jul 2007
Location: Norfolk, England
Device: NOOK ST GlowLight
Now at version 1.1. Writes out the stripped data as a zip file. The data in the Mobipocket file seems to have a 16 byte header that's written out as hexadecimal to the standard output. Thos using the AppleScript won't see this at all. I have no idea what the 16 bytes mean, so this probably isn't a loss.
pdurrant is online now   Reply With Quote
Old 09-03-2010, 05:40 PM   #3
daffy4u
I'm Super Kindle-icious
daffy4u ought to be getting tired of karma fortunes by now.daffy4u ought to be getting tired of karma fortunes by now.daffy4u ought to be getting tired of karma fortunes by now.daffy4u ought to be getting tired of karma fortunes by now.daffy4u ought to be getting tired of karma fortunes by now.daffy4u ought to be getting tired of karma fortunes by now.daffy4u ought to be getting tired of karma fortunes by now.daffy4u ought to be getting tired of karma fortunes by now.daffy4u ought to be getting tired of karma fortunes by now.daffy4u ought to be getting tired of karma fortunes by now.daffy4u ought to be getting tired of karma fortunes by now.
 
daffy4u's Avatar
 
Posts: 6,686
Karma: 2261817
Join Date: Apr 2008
Location: Long Drive, Calinadia Candafornia
Device: K1, KTSO, KFHD7, KPW1
Thanks pdurrant! I don't have any books to upload to Amazon but I always appreciate the efforts of those who push the Kindle limits to make it even more useful.
daffy4u is offline   Reply With Quote
Old 09-24-2010, 12:23 AM   #4
ATDrake
Wizzard
ATDrake ought to be getting tired of karma fortunes by now.ATDrake ought to be getting tired of karma fortunes by now.ATDrake ought to be getting tired of karma fortunes by now.ATDrake ought to be getting tired of karma fortunes by now.ATDrake ought to be getting tired of karma fortunes by now.ATDrake ought to be getting tired of karma fortunes by now.ATDrake ought to be getting tired of karma fortunes by now.ATDrake ought to be getting tired of karma fortunes by now.ATDrake ought to be getting tired of karma fortunes by now.ATDrake ought to be getting tired of karma fortunes by now.ATDrake ought to be getting tired of karma fortunes by now.
 
Posts: 6,021
Karma: 14705828
Join Date: Mar 2010
Location: Roundworld
Device: Kindle 2 International & Sony PRS-T1
Just tried this on a couple of auto-generated mobis made via the new version of Kindle Previewer (1.5).

It now has"ePub support", by which it means that it automatically converts any ePubs dragged upon it to mobi and drops the file in the same folder, apparently on the lower -c1 compression setting. Also a new simulation option for iPad, but no K3 mode yet. But the people trying to figure out Kindle Audio/Video now have a new testing tool for their efforts.

Anyway, the stripping works a treat and the extraction gives back almost exactly went in, as far as I can tell. Did a few more tests with my lazily assembled Fictionwise cleanup conversions and html comes back as zipped html, and a zipped up ePub in yields the exact same zipped-up ePub out.

Interestingly enough, if you originally pointed KindleGen at an opf (either custom or via unpacked epub), then no matter what the source structure, the unzipped-from-stripped version yields up the css, html, image, and misc (ncx, etc.) files rearranged into separate subdirectories with exactly those names.

Stripped file has immense space savings, often near-halving; sometimes more if there are a fair number of graphics involved in the source. Even pure text with no pictures is over a third smaller.

I have absolutely no idea why Amazon would remove the entirely logical -donotaddsource option unless they actually want to serve up plenty of bloated files via 3G and cut down on the marketable "Kindle can hold #### books!" space (and deduct extra from royalties paid out, of course), which seems rather counter-productive to me.

While we're on the subject of inexplicable KindleGen design decisions, might as well mention some more things I found out while using it:
  1. Plain old descendent selectors, a staple since CSS1, seem to be completely ignored. Another black mark for KindleGen's (lack of) CSS support and means that one will likely have to class every item one wants to target with a particular style not shared with its siblings, rather than classing a container parent element for the lot and letting specific descent rather than generic inheritance take place.
  2. If you forget to close a <div> with styling applied, all subsequent text seems to be rendered with the same styling, even if it occurs in separate files in the source, at least until it hits the next tag with a different style.
  3. If you have any superfluous tags in your NCX, even a mistakenly applied empty closing tag like say, </head>, then KindleGen will merrily ignore your painstakingly constructed <navMap> and happily build with nary a warning until you find out that your mobi has no chapter marks and spend far too long trying to figure out why.
Thanks again for writing this script! I'm sure people will be finding it very useful if Amazon's going to insist on always including the source files.
ATDrake is offline   Reply With Quote
Old 09-24-2010, 01:17 PM   #5
ATDrake
Wizzard
ATDrake ought to be getting tired of karma fortunes by now.ATDrake ought to be getting tired of karma fortunes by now.ATDrake ought to be getting tired of karma fortunes by now.ATDrake ought to be getting tired of karma fortunes by now.ATDrake ought to be getting tired of karma fortunes by now.ATDrake ought to be getting tired of karma fortunes by now.ATDrake ought to be getting tired of karma fortunes by now.ATDrake ought to be getting tired of karma fortunes by now.ATDrake ought to be getting tired of karma fortunes by now.ATDrake ought to be getting tired of karma fortunes by now.ATDrake ought to be getting tired of karma fortunes by now.
 
Posts: 6,021
Karma: 14705828
Join Date: Mar 2010
Location: Roundworld
Device: Kindle 2 International & Sony PRS-T1
Also, I think I've figured out what the mysterious header bytes mean.

If your source was converted straight from a properly zipped ePub, then you get 53524353000000100000003000000001. If it came from any combination of un-prepackaged html/opf, it'll be 53524353000000100000002f00000001. If it's a no-source-files-added mobi to begin with, then the header bytes are 46434953000000140000001000000002.

And it seems that even the samples offered for the newer books at Amazon nowadays include the bloat (but only from the mobi conversion and cut off appropriately at the sample length), which looks like it's a useless expenditure to me.

Ah well, if they want to waste their server bandwidth for no good reason, that's entirely up to them. As long as they don't go back to charging that extra $2 Whispernet surcharge that they finally got rid of for Canadians.
ATDrake is offline   Reply With Quote
Old 02-21-2011, 12:55 PM   #6
twedigteam
Enthusiast
twedigteam began at the beginning.
 
Posts: 38
Karma: 10
Join Date: Nov 2010
Device: Sony eReader
If anyone still takes a gander at this thread, having some issues running the Kindlestrip tool on OSX10.6.6; a simple drag & drop of a .mobi file onto the AppleScript file doesn't actually cause anything to occur...taking a closer look, I'm wondering if the inherent Python files on my Mac are outdated to run kindlestrip properly (I had no issue at whatsoever using your ePub zip/unzip scripts, but I could be misled in that they don't use the Python language?). My version:

Python 2.6.1 (r261:67515, Jun 24 2010, 21:47:49)

Noticed that there is a more current build of 3.2, wondering if maybe this could be the issue? I'm sure also there is a way to run from Terminal, but I am certainly not at that level of familiarity with Python to do so....thanks in advance if anyone spots this...
twedigteam is offline   Reply With Quote
Old 02-21-2011, 01:08 PM   #7
ATDrake
Wizzard
ATDrake ought to be getting tired of karma fortunes by now.ATDrake ought to be getting tired of karma fortunes by now.ATDrake ought to be getting tired of karma fortunes by now.ATDrake ought to be getting tired of karma fortunes by now.ATDrake ought to be getting tired of karma fortunes by now.ATDrake ought to be getting tired of karma fortunes by now.ATDrake ought to be getting tired of karma fortunes by now.ATDrake ought to be getting tired of karma fortunes by now.ATDrake ought to be getting tired of karma fortunes by now.ATDrake ought to be getting tired of karma fortunes by now.ATDrake ought to be getting tired of karma fortunes by now.
 
Posts: 6,021
Karma: 14705828
Join Date: Mar 2010
Location: Roundworld
Device: Kindle 2 International & Sony PRS-T1
I'm also on 10.6.6 and the AppleScript has been working for me for the past couple of months and again when I used it yesterday.

I used to have the standard Python 2.6-ish install, but then I went and got the 2.7.1 installer from Python.org (after the source failed to compile, grr).

Maybe your unzip utility sets the permissions wrongly?

In any case, to use it on the command-line, just do python PATH/TO/kindlestrip.py OriginalFile.mobi OutputFile.mobi OptionalStrippedData.zip

You can drag and drop the kindlestrip.py file onto the Terminal window and it will autofill its path, and the 3rd filename is optional if you don't care about looking at the stripped data.

You can also alias it in your .profile for convenience, aka:

alias kstrip="python PATH/TO/kindlestrip.py"

and then string together a series of commands to batch process a folder:

alias kstripbatch='for m in *.mobi; do kstrip "$m" "${m/.mobi/-stripped.mobi}"; done'
ATDrake is offline   Reply With Quote
Old 02-21-2011, 03:03 PM   #8
pdurrant
The Grand Mouse
pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.
 
pdurrant's Avatar
 
Posts: 29,331
Karma: 83792800
Join Date: Jul 2007
Location: Norfolk, England
Device: NOOK ST GlowLight
Quote:
Originally Posted by twedigteam View Post
If anyone still takes a gander at this thread, having some issues running the Kindlestrip tool on OSX10.6.6; a simple drag & drop of a .mobi file onto the AppleScript file doesn't actually cause anything to occur...taking a closer look, I'm wondering if the inherent Python files on my Mac are outdated to run kindlestrip properly (I had no issue at whatsoever using your ePub zip/unzip scripts, but I could be misled in that they don't use the Python language?). My version:

Python 2.6.1 (r261:67515, Jun 24 2010, 21:47:49)

Noticed that there is a more current build of 3.2, wondering if maybe this could be the issue? I'm sure also there is a way to run from Terminal, but I am certainly not at that level of familiarity with Python to do so....thanks in advance if anyone spots this...
I can't think why it wouldn't work for you. It works here. You don't need Python 3.x. Most of the python scripts around are written for Python 2.x where x>=5, including this one. It may well not work with 3.x at all.

What happens if you just double-click the applescript? (It should ask you to locate kindlestrip.py - just click cancel if it does.)
pdurrant is online now   Reply With Quote
Old 02-21-2011, 07:25 PM   #9
twedigteam
Enthusiast
twedigteam began at the beginning.
 
Posts: 38
Karma: 10
Join Date: Nov 2010
Device: Sony eReader
Quote:
Originally Posted by ATDrake View Post

In any case, to use it on the command-line, just do python PATH/TO/kindlestrip.py OriginalFile.mobi OutputFile.mobi OptionalStrippedData.zip
Worked like a charm. Clearly no issue with the code if this goes through. I'll retry the script on a coworkers system later in the week.

Once again, a tip of the hat...the help here is impressively reliable, and kudos on the tools....
twedigteam is offline   Reply With Quote
Old 02-21-2011, 07:27 PM   #10
twedigteam
Enthusiast
twedigteam began at the beginning.
 
Posts: 38
Karma: 10
Join Date: Nov 2010
Device: Sony eReader
Quote:
Originally Posted by pdurrant View Post
I can't think why it wouldn't work for you. It works here. You don't need Python 3.x. Most of the python scripts around are written for Python 2.x where x>=5, including this one. It may well not work with 3.x at all.

What happens if you just double-click the applescript? (It should ask you to locate kindlestrip.py - just click cancel if it does.)
Double-clicking does ask to locate the .py file, and I've tried every possible combination, including removing the scripts and re-downloading them. As I mentioned above, it works fine in command line so the AppleScript issue is just a local one

....thanks again!
twedigteam is offline   Reply With Quote
Old 02-22-2011, 03:29 AM   #11
pdurrant
The Grand Mouse
pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.
 
pdurrant's Avatar
 
Posts: 29,331
Karma: 83792800
Join Date: Jul 2007
Location: Norfolk, England
Device: NOOK ST GlowLight
Quote:
Originally Posted by twedigteam View Post
it works fine in command line so the AppleScript issue is just a local one
How odd. You could try opening it with Script Editor and re-saving.
pdurrant is online now   Reply With Quote
Old 02-22-2011, 05:54 AM   #12
Piquan
Junior Member
Piquan began at the beginning.
 
Posts: 1
Karma: 10
Join Date: Feb 2011
Device: Kindle 3
Thanks for your investigation, and the tool!

After I got v1.1, I added, at line 78 (just after calculating penoffset and lastoffset), the following:
if datain[self.penoffset:self.penoffset+4] != 'SRCS':
raise StripException("already stripped")

The intention here is to not delete the FCIS segment from an already-stripped file (or one that was generated with -donotaddsource). I'm enough of a doofus that I'm sure to mess up something by stripping it twice! (On the other hand, I've found one source that says the FLIS and FCIS segments aren't necessary for the Kindle, so at least I'd get a few second chances.)

Thanks again!
Piquan is offline   Reply With Quote
Old 02-22-2011, 06:19 AM   #13
pdurrant
The Grand Mouse
pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.
 
pdurrant's Avatar
 
Posts: 29,331
Karma: 83792800
Join Date: Jul 2007
Location: Norfolk, England
Device: NOOK ST GlowLight
Quote:
Originally Posted by Piquan View Post
Thanks for your investigation, and the tool!

After I got v1.1, I added, at line 78 (just after calculating penoffset and lastoffset), the following:
if datain[self.penoffset:self.penoffset+4] != 'SRCS':
raise StripException("already stripped")

The intention here is to not delete the FCIS segment from an already-stripped file (or one that was generated with -donotaddsource). I'm enough of a doofus that I'm sure to mess up something by stripping it twice! (On the other hand, I've found one source that says the FLIS and FCIS segments aren't necessary for the Kindle, so at least I'd get a few second chances.)

Thanks again!
What a good find. I hadn't realised that there were some constant bytes in that bit. I'll see if I can get an updated version done.
pdurrant is online now   Reply With Quote
Old 03-03-2011, 07:14 AM   #14
pdurrant
The Grand Mouse
pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.
 
pdurrant's Avatar
 
Posts: 29,331
Karma: 83792800
Join Date: Jul 2007
Location: Norfolk, England
Device: NOOK ST GlowLight
Quote:
Originally Posted by pdurrant View Post
What a good find. I hadn't realised that there were some constant bytes in that bit. I'll see if I can get an updated version done.
Now updated to version 1.2, adding the sanity checking suggested by Piquan.
pdurrant is online now   Reply With Quote
Old 09-19-2011, 02:41 AM   #15
Xabache
Carbon Reserve
Xabache began at the beginning.
 
Xabache's Avatar
 
Posts: 44
Karma: 10
Join Date: Jun 2010
Device: PC
Could someone write a step by step tutorial for this using kindlestrip. I have tried to follow along but fell flat on my face despite being generally knowledgeable of computers. Step by step please, download this, drag that... Thanks.

Last edited by Xabache; 09-19-2011 at 02:45 AM.
Xabache is offline   Reply With Quote
Reply

Tags
k5 tools, mobi2mobi

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Applescript Wrapper Application for Kindlegen pdurrant Kindle Formats 13 03-14-2013 02:07 PM
how to use python script with windows xp tuufbiz1 Other formats 12 01-08-2011 08:22 AM
How do I get a shortcut for a Python script onto the taskbar in W7? Sydney's Mom Workshop 6 03-28-2010 08:11 PM
Nedd a little help with a python script gandor62 Calibre 1 08-07-2008 09:59 PM
Python script to create collections gwynevans Sony Reader Dev Corner 2 03-13-2008 12:29 PM


All times are GMT -4. The time now is 07:04 AM.


MobileRead.com is a privately owned, operated and funded community.