Getting around DRM, encoding?

kyzcreig · 06-19-2015, 10:04 PM

DRM is a real nuisance for us paying customers. I like to curate my notes, and usually do so after reading a great book. So you can imagine my surprise when I realized 90% of my annotations had been ignored.

Fortunately the annotations are still visible in the Kindle and the location data is in tact in my clippings.txt file. This gave me the idea of taking the location information for each annotation and then extracting the appropriate text from the original mobi file via a script. My understanding is that location corresponds to 128 bytes of data, so it should be straight forward to put all this information into a file. But I'm not sure how it's encoded and when I use something like UTF it's a half garbled mess.

I'm novice programmer though so I'm wondering:

A) if this is actually feasible
B) how hard it will be to decode mid-book excerpts

As for the DRM itself, I've found tools for stripping it but I'm not sure if that will corrupt the location information. From what I can tell it doesn't.

knc1 · 06-19-2015, 10:30 PM

I think that the subject of "getting around" or otherwise defeating DRM is against the site rules.

But here is a slide show of the various types of block encryption:
http://www.utdallas.edu/~muratk/cour...iles/modes.pdf

To answer your question, you have to know which of the above types is used by the DRM you are interested in making random access too.

For that information, you'll have to go to some other source of information than MobileRead.
Sorry, we don't disturb other people's I.P. here.

- - - -

You had better check your prior source(s) of information, that is most likely **bits** not **bytes** (block sizes are usually referred to by their **bit length** in cryptology but I don't have a clue what is the common practice in DRM methods).

kyzcreig · 06-21-2015, 05:00 PM

Quote:

Originally Posted by knc1

I think that the subject of "getting around" or otherwise defeating DRM is against the site rules.

But here is a slide show of the various types of block encryption:
http://www.utdallas.edu/~muratk/cour...iles/modes.pdf

To answer your question, you have to know which of the above types is used by the DRM you are interested in making random access too.

For that information, you'll have to go to some other source of information than MobileRead.
Sorry, we don't disturb other people's I.P. here.

- - - -

You had better check your prior source(s) of information, that is most likely **bits** not **bytes** (block sizes are usually referred to by their **bit length** in cryptology but I don't have a clue what is the common practice in DRM methods).

Interesting, so I don't necessarily need to decrypt anything. To put it more succinctly I want to use Amazon's location values to extract passages from a .mobi, then decode them into legible text. The DRM is something slightly different although I'm also interested in it.

Though maybe the existing anti-DRM solutions will render the above impossible due to loss of interstitial meta data.

knc1 · 06-21-2015, 05:58 PM

Quote:

Originally Posted by kyzcreig

So I actually don't want to decrypt anything, I believe what I would make would have general utility.

To put it more succinctly I want to use Amazon's location values to extract passages from text.

DRM is irrelevant here and of course removing it wouldn't solve my problems either.

The passages would be a bit hard to read if they were encrypted and you didn't decrypt them.

Plus, DRM was mentioned in the title of this thread.
Which is the reason I thought it would be relevant.

- - - -

Ah, which leaves only a question of the sort of measure that Amazon is using.

It might be bytes or characters.
For the starting location, either would be possible.
For the length, either would be possible.
("possible" because nothing is going to translate or convert the text encoding between making the notation and looking it up.)

My own first guess would be starting location in bytes and length in characters (remember, Kindles handle multi-byte character sets).

I don't know but a bit of experimenting (on a non-DRM protected document) should tell you what types of measurement units are being used.

- - - -

If the same code was to be used for both DRM and non-DRM protected documents - -
then the values would be two part values:
Block number and Displacement (in either bytes or characters) into the Block.

So a bit (no pun intended) of research into the block size that Amazon uses would still be required.
It should be easy to find experimentally, at least for a non-DRM protected document.

- - - -

Two part position (and length) value systems are common in file systems.
I.E: first block number:depth in bytes of start
similar for length and/or ending position.

eschwartz · 06-21-2015, 08:40 PM

The necessary rules should be documented in calibre's code. The page number scheme has been cracked already, though it remains low-interest... however, the Kindle device driver for calibre includes a feature for calculating pseudorandom page numbers and generating a matching APNX file.

knc1 · 06-22-2015, 07:52 AM

^^ Thanks ^^
It was details about Calibre that I had no idea about.

kyzcreig · 06-25-2015, 02:22 AM

I can confirm the LOC data corresponds to 150 byte chunks, not 128 bytes as I previously thought. I've also managed to decrypt the book and convert to raw HTML. But this leaves me with the presky problem of cleaning the text up.

There's a lot of damaged markup in each of these chunks. Any suggestions on how to deal with this? Or perhaps there's a tool that would automatically scrape the appropriate text, given byte offsets?

Edit: BeautifulSoup saves the day!! Imprecision aside, I've got everything working and I think I might post this on the internet to help other people out.

06-19-2015, 10:04 PM	#1
kyzcreig Enthusiast Posts: 33 Karma: 12694 Join Date: Aug 2014 Device: kindle paperwhite	Getting around DRM, encoding? DRM is a real nuisance for us paying customers. I like to curate my notes, and usually do so after reading a great book. So you can imagine my surprise when I realized 90% of my annotations had been ignored. Fortunately the annotations are still visible in the Kindle and the location data is in tact in my clippings.txt file. This gave me the idea of taking the location information for each annotation and then extracting the appropriate text from the original mobi file via a script. My understanding is that location corresponds to 128 bytes of data, so it should be straight forward to put all this information into a file. But I'm not sure how it's encoded and when I use something like UTF it's a half garbled mess. I'm novice programmer though so I'm wondering: A) if this is actually feasible B) how hard it will be to decode mid-book excerpts As for the DRM itself, I've found tools for stripping it but I'm not sure if that will corrupt the location information. From what I can tell it doesn't.

06-19-2015, 10:30 PM	#2
knc1 Going Viral Posts: 17,212 Karma: 18210809 Join Date: Feb 2012 Location: Central Texas Device: No K1, PW2, KV, KOA	I think that the subject of "getting around" or otherwise defeating DRM is against the site rules. But here is a slide show of the various types of block encryption: http://www.utdallas.edu/~muratk/cour...iles/modes.pdf To answer your question, you have to know which of the above types is used by the DRM you are interested in making random access too. For that information, you'll have to go to some other source of information than MobileRead. Sorry, we don't disturb other people's I.P. here. - - - - You had better check your prior source(s) of information, that is most likely bits not bytes (block sizes are usually referred to by their bit length in cryptology but I don't have a clue what is the common practice in DRM methods). Last edited by knc1; 06-19-2015 at 10:40 PM.

06-25-2015, 02:22 AM	#7
kyzcreig Enthusiast Posts: 33 Karma: 12694 Join Date: Aug 2014 Device: kindle paperwhite	I can confirm the LOC data corresponds to 150 byte chunks, not 128 bytes as I previously thought. I've also managed to decrypt the book and convert to raw HTML. But this leaves me with the presky problem of cleaning the text up. There's a lot of damaged markup in each of these chunks. Any suggestions on how to deal with this? Or perhaps there's a tool that would automatically scrape the appropriate text, given byte offsets? Edit: BeautifulSoup saves the day!! Imprecision aside, I've got everything working and I think I might post this on the internet to help other people out. Last edited by kyzcreig; 06-25-2015 at 03:31 AM.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Getting around DRM, encoding?	kyzcreig	Kindle Formats	4	06-26-2015 12:31 PM
What character encoding am I seeing?	Claghorn	Conversion	1	08-22-2012 10:02 AM
Encoding	prusaks	Recipes	0	09-27-2010 06:25 AM
how to tell the character encoding???	rheostaticsfan	Calibre	23	06-21-2010 03:26 PM
how to add encoding?	nsg	Calibre	5	02-25-2009 09:51 PM

06-21-2015, 08:40 PM	#5
eschwartz Ex-Helpdesk Junkie Posts: 19,422 Karma: 85397180 Join Date: Nov 2012 Location: The Beaten Path, USA, Roundworld, This Side of Infinity Device: Kindle Touch fw5.3.7 (Wifi only)	The necessary rules should be documented in calibre's code. The page number scheme has been cracked already, though it remains low-interest... however, the Kindle device driver for calibre includes a feature for calculating pseudorandom page numbers and generating a matching APNX file.

06-22-2015, 07:52 AM	#6
knc1 Going Viral Posts: 17,212 Karma: 18210809 Join Date: Feb 2012 Location: Central Texas Device: No K1, PW2, KV, KOA	^^ Thanks ^^ It was details about Calibre that I had no idea about.

Advert

Advert