06-19-2015, 10:04 PM | #1 |
Enthusiast
Posts: 33
Karma: 12694
Join Date: Aug 2014
Device: kindle paperwhite
|
Getting around DRM, encoding?
DRM is a real nuisance for us paying customers. I like to curate my notes, and usually do so after reading a great book. So you can imagine my surprise when I realized 90% of my annotations had been ignored.
Fortunately the annotations are still visible in the Kindle and the location data is in tact in my clippings.txt file. This gave me the idea of taking the location information for each annotation and then extracting the appropriate text from the original mobi file via a script. My understanding is that location corresponds to 128 bytes of data, so it should be straight forward to put all this information into a file. But I'm not sure how it's encoded and when I use something like UTF it's a half garbled mess. I'm novice programmer though so I'm wondering: A) if this is actually feasible B) how hard it will be to decode mid-book excerpts As for the DRM itself, I've found tools for stripping it but I'm not sure if that will corrupt the location information. From what I can tell it doesn't. |
06-19-2015, 10:30 PM | #2 |
Going Viral
Posts: 17,212
Karma: 18210809
Join Date: Feb 2012
Location: Central Texas
Device: No K1, PW2, KV, KOA
|
I think that the subject of "getting around" or otherwise defeating DRM is against the site rules.
But here is a slide show of the various types of block encryption: http://www.utdallas.edu/~muratk/cour...iles/modes.pdf To answer your question, you have to know which of the above types is used by the DRM you are interested in making random access too. For that information, you'll have to go to some other source of information than MobileRead. Sorry, we don't disturb other people's I.P. here. - - - - You had better check your prior source(s) of information, that is most likely **bits** not **bytes** (block sizes are usually referred to by their **bit length** in cryptology but I don't have a clue what is the common practice in DRM methods). Last edited by knc1; 06-19-2015 at 10:40 PM. |
Advert | |
|
06-21-2015, 05:00 PM | #3 | |
Enthusiast
Posts: 33
Karma: 12694
Join Date: Aug 2014
Device: kindle paperwhite
|
Quote:
Though maybe the existing anti-DRM solutions will render the above impossible due to loss of interstitial meta data. Last edited by kyzcreig; 06-21-2015 at 07:58 PM. |
|
06-21-2015, 05:58 PM | #4 | |
Going Viral
Posts: 17,212
Karma: 18210809
Join Date: Feb 2012
Location: Central Texas
Device: No K1, PW2, KV, KOA
|
Quote:
Plus, DRM was mentioned in the title of this thread. Which is the reason I thought it would be relevant. - - - - Ah, which leaves only a question of the sort of measure that Amazon is using. It might be bytes or characters. For the starting location, either would be possible. For the length, either would be possible. ("possible" because nothing is going to translate or convert the text encoding between making the notation and looking it up.) My own first guess would be starting location in bytes and length in characters (remember, Kindles handle multi-byte character sets). I don't know but a bit of experimenting (on a non-DRM protected document) should tell you what types of measurement units are being used. - - - - If the same code was to be used for both DRM and non-DRM protected documents - - then the values would be two part values: Block number and Displacement (in either bytes or characters) into the Block. So a bit (no pun intended) of research into the block size that Amazon uses would still be required. It should be easy to find experimentally, at least for a non-DRM protected document. - - - - Two part position (and length) value systems are common in file systems. I.E: first block number:depth in bytes of start similar for length and/or ending position. Last edited by knc1; 06-21-2015 at 06:07 PM. |
|
06-21-2015, 08:40 PM | #5 |
Ex-Helpdesk Junkie
Posts: 19,422
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
|
The necessary rules should be documented in calibre's code. The page number scheme has been cracked already, though it remains low-interest... however, the Kindle device driver for calibre includes a feature for calculating pseudorandom page numbers and generating a matching APNX file.
|
Advert | |
|
06-22-2015, 07:52 AM | #6 |
Going Viral
Posts: 17,212
Karma: 18210809
Join Date: Feb 2012
Location: Central Texas
Device: No K1, PW2, KV, KOA
|
^^ Thanks ^^
It was details about Calibre that I had no idea about. |
06-25-2015, 02:22 AM | #7 |
Enthusiast
Posts: 33
Karma: 12694
Join Date: Aug 2014
Device: kindle paperwhite
|
I can confirm the LOC data corresponds to 150 byte chunks, not 128 bytes as I previously thought. I've also managed to decrypt the book and convert to raw HTML. But this leaves me with the presky problem of cleaning the text up.
There's a lot of damaged markup in each of these chunks. Any suggestions on how to deal with this? Or perhaps there's a tool that would automatically scrape the appropriate text, given byte offsets? Edit: BeautifulSoup saves the day!! Imprecision aside, I've got everything working and I think I might post this on the internet to help other people out. Last edited by kyzcreig; 06-25-2015 at 03:31 AM. |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Getting around DRM, encoding? | kyzcreig | Kindle Formats | 4 | 06-26-2015 12:31 PM |
What character encoding am I seeing? | Claghorn | Conversion | 1 | 08-22-2012 10:02 AM |
Encoding | prusaks | Recipes | 0 | 09-27-2010 06:25 AM |
how to tell the character encoding??? | rheostaticsfan | Calibre | 23 | 06-21-2010 03:26 PM |
how to add encoding? | nsg | Calibre | 5 | 02-25-2009 09:51 PM |