View Full Version : Watermark for EPUB


Chang
03-02-2010, 02:53 AM
I would like to create watermarks for my EPUB files. Of course, I can do this manually but are there any programs to ease my job? For example, would be nice to add some small watermark after every chapter and if the book contains several chapters, it's quite toilsome to do it manually. Watermark could be just a short piece of text or a small image.

I would also like to know if you have any good suggestions to do watermarks and what do you think about them. Personally I don't mind about watermarks if they are nice and clean or invisible.

charleski
03-02-2010, 07:32 AM
I assume by 'watermark' you mean adding something that will uniquely identify the version of the ePub that you distribute. What purpose do you want the watermark to serve? If you're preparing several different versions of an ePub to send out to different people, then the most obvious route is simply to change the <dc:identifier> value in the content.opf file. Obviously anyone can edit the ePub and change that, but they could change any code you add to the text as well.

If you want to obfuscate the watermark so that it can't be easily detected or changed, then that's a different issue, but is quite possible with some technical skill. Most ePubs contain one or more images and there are several ways of watermarking a jpeg.

Chang
03-02-2010, 08:29 AM
Thank you for your answer charleski!

I'm interested in both kind of watermarks, visible and invisible. Adding metadata or some "meaningless code" e.g. empty div elements or comments in the XHTML I count as visible watermarks because those can be found quite easily and therefore deleted easily as well. I'm also looking for a simple but effective way to add invisible watermark. I guess this is a secret for many people and they don't want to share their solutions and algorithms which is understandable. I'll google more about adding watermark for jpeg images, thanks for the idea!

I don't like DRM and that's why I'm trying to find other solutions to fight against piracy. With invisible watermark I could at least find out "where's the leak".

DaleDe
03-02-2010, 01:33 PM
Thank you for your answer charleski!

I'm interested in both kind of watermarks, visible and invisible. Adding metadata or some "meaningless code" e.g. empty div elements or comments in the XHTML I count as visible watermarks because those can be found quite easily and therefore deleted easily as well. I'm also looking for a simple but effective way to add invisible watermark. I guess this is a secret for many people and they don't want to share their solutions and algorithms which is understandable. I'll google more about adding watermark for jpeg images, thanks for the idea!

I don't like DRM and that's why I'm trying to find other solutions to fight against piracy. With invisible watermark I could at least find out "where's the leak".

One very simple way to do an invisible watermark is to place a comment in the jpg image. This won't show and most people wouldn't even notice it. It can also be used to identify the source and indicate copyright. Irfanview is a program that can add a comment to an image. Of course it is not as obscure as modifying the image itself.

Dale

Jellby
03-02-2010, 02:15 PM
... and you can add something like a PGP signature of the whole book, created with your private key.

kovidgoyal
03-02-2010, 02:20 PM
It's not possible to make a watermark that cannot be deleted easily. The best you can do is make a watermark that cannot be deleted easily by technically challenged users, and even that really only relies on the DMCA.

michaelhughes
03-03-2010, 02:26 PM
While this wouldn't be a visible watermark, if you wanted to be able to determine who the source of a violation was, you could always drop an extra file with ownership data into the content directory before you zip the files up.

Then, if you find someone has shared your file on a torrent or other warez site, you could unpack it and find out who the original file was vended to.

Chang
03-04-2010, 04:15 AM
Thank you for your helpful answers and comments!


One very simple way to do an invisible watermark is to place a comment in the jpg image. This won't show and most people wouldn't even notice it. It can also be used to identify the source and indicate copyright. Irfanview is a program that can add a comment to an image. Of course it is not as obscure as modifying the image itself.

I guess you mean adding metadata for a JPEG image. That is one good solution because it's invisible unless the metadata is being checked. Too bad that the metadata can be easily erased.


... and you can add something like a PGP signature of the whole book, created with your private key.

I don't know much about PGP digital signature yet but I googled about it. I'll try it out and see how it works.


It's not possible to make a watermark that cannot be deleted easily.

I agree but I personally believe that piracy people just want to share the files. If I have enough complicated watermark system, it's not worth to solve it because it doesn't show up anywhere and it doesn't prevent the file for being spread around internet. Everyone is still able to read my e-books even though there's a hidden watermark.


While this wouldn't be a visible watermark, if you wanted to be able to determine who the source of a violation was, you could always drop an extra file with ownership data into the content directory before you zip the files up.

It's a good idea but extra files are easily detected and easily deleted as well.

Chang
03-05-2010, 06:47 AM
... and you can add something like a PGP signature of the whole book, created with your private key.

I tried it out but maybe I'm using a wrong program or I'm using it wrong :) I'm using this program http://www.gpg4win.org/ and it seems to be a good one. It's an open replacement for PGP.

This might go a bit off-topic but... Problem is that when I sign my epub file, it's not anymore an epub file but a GPG file. Would be nice if it would still remain as an epub file so it could be read normally. I don't know have you used digital signatures for your e-books but for me it seems that when signing a file, it changes the file format until it's opened again with correct key.

I found this good tutorial http://www.glump.net/howto/gpg_intro#signing_files and it says: "Text messages can have signatures appended to them without disrupting the contents of the message too much, but binary files such as Microsoft Word documents and Zip archives can't have arbitrary data attached to them. To sign binary files, it is costumary to have GPG create a separate signature file."

If I want to sign my epub file, I would have to create separated file which contains my signature. Is there a way to sign my epub file without the file format being changed or creating a separated signature file?

charleski
03-05-2010, 07:27 AM
Signing the file is virtually the opposite of watermarking it. It allows other people to verify that the file has come from you without any third-party changing it.

Jellby
03-05-2010, 08:32 AM
If I want to sign my epub file, I would have to create separated file which contains my signature. Is there a way to sign my epub file without the file format being changed or creating a separated signature file?

Indeed, you would have to include the signature in a separate file, and if you add it to the epub, the signature is no longer valid. My idea was including a signature not of the whole epub file, but only of the main text. Say you have "text.xhtml" in your epub, then you add "text.gpg" too, which contains the signature for text.xhtml. Since text.gpg is only referenced in the manifest, but not used anywhere, users may not easily see it; if you add some plain-text watermark to text.xhtml, it could serve as a sort of backup watermark (a malicious user thinks he's quite smart and deletes the watermark from text.xhtml, but doesn't remove text.gpg; now you find the epub file in the darknet and see it does not have the plain-text watermark, but if you store a database of signatures, you can detect which file it originally was from text.gpg). Of course, it does nothing to prevent another user to remove text.gpg as well.

As charleski says, the concept is somehow opposite to normal watermarking. It depends on what your intent is. If you want to mark every epub file differently so you can eventually detect which copy was leaked to the darknet, I would recommend multiple "watermarks": Some plain-text identification visible in the book, a <meta> tag in the OPF, comments metadata in pictures (the cover picture or some logo would be good candidates), and maybe something else (comments in HTML, CSS, or NCX files).

If you are feeling clever, you could devise some way of coding an identifier by including typos or slight changes in the text: sometimes there are several correct spellings for a word, or a comma/semicolon change could be harmless, or whether or not there is a paragraph break...

Chang
03-05-2010, 08:42 AM
Thank you both, charleski and Jellby, for your kind help :)

I have now good different kind of tips to do watermarks and I think I can figure out something. Thanks!

Jellby
03-05-2010, 09:30 AM
More ways to include hidden identifiers:

Change the names of CSS classes or id attributes. Instead of <span class="smallcaps"> or <h1 id="chapter_1"> you could have <span class="smallcaps_abcd"> or <h1 id="chapter_1_abcd">, where "abcd" is an identifier specific for each copy.

Have two or more classes with different names but exactly the same CSS, and have the exact appearance of one or the other be a code to idenitify each copy. If you have 20 cases of <span class="smallcaps">, make them <span class="smallcaps1"> and <span class="smallcaps2"> instead, that makes room for 2^20 = more than 1 million different combinations.

More subtly, whether or not a fullstop is included in an HTML tag is sometimes unnoticeable, the code can be included in the instances of "<em>italic</em>." and "<em>italic.</em>".

You could add unstyled <span> tags around selected letters of the text, like this:

I en<span>jo</span>y my eig<span>h</span>t-a<span>n</span>d-a-half hour<span>s</span> of sleep <span>m</span>ost happ<span>i</span>ly, but i<span>t h</span>urts.

If you take only the letters in the tags, it spells "johnsmith", but it's invisible in the book, and could be far from obvious in the code, unless you know what to look for.

Any of these methods can be defeated if the user actually modifies the code (HTML/CSS) of the book, but they'd work if the book is just uploaded as-is.

aarcane
03-05-2010, 02:21 PM
wrap each chapter in a <div class="chapter"> and add a css style div.chapter:after to provide a visible watermark. additionally you can use the gimp to add a translucent watermark to your images. nearly impossible to remove once it's been done.

Along the veins of a GPG signature, there's a signatures.xml file. format is kinda complex, but you can use it. also, if you insist on using gpg and a .asc signature, you can sign a an entire folder or files inside a folder, then include the .asc file in the container without invalidating it.

Another possible way is to generate a few typos on accident change the to teh, or change other larger words by one inverted character, or one missed or extra. one or two per file. things not commonly typod, but still readable. keep a list of those typos. which file/chapter and which line. when you see someone using your files, you look in those files for those typos, and within a few lines either way. if you can't find them, then your adversary either did the work himself, or did a good job covering up his criem.

larkki
03-23-2010, 11:17 AM
Thank you both, charleski and Jellby, for your kind help :)

I have now good different kind of tips to do watermarks and I think I can figure out something. Thanks!

Have you proceeded with your solution? I'm interested because I was looking for some solutions from the web and so far haven't been able to find any.
What are the combinations that you are using and how's it working so far?

Chang
03-25-2010, 03:25 AM
Have you proceeded with your solution? I'm interested because I was looking for some solutions from the web and so far haven't been able to find any.
What are the combinations that you are using and how's it working so far?

Yes, I have proceeded with my solution but I don't have any experience on how it will work in practice, yet. I won't tell my exact ways to do the watermark but I mostly used the tips in this thread and some few other ideas as well. I don't know how my solution will work but I guess I'll find it out in the next couple months.

darkmonk
03-29-2010, 12:16 AM
I have a devilish solution, but it requires some programming. Dynamical generate each epub. I;ve though of several places you can add your information: The names of the directories and files, the names of the attributes (class, div, etc.), or, more evil-y, edit the fonts. Add a series of characters in there that is always the email or something. As for obscuring the data you want to add, which you should do to make it non-obvious, simply encrypt each piece. Oh, another thing I've heard of it directly adding on a zip file to a picture. It's supposed to be ignored, and then by recording the byte size you can just take that much of the end to recover it.

That'd take some work, but each should work fine. Positively brilliant, methinks. But I doubt you will implement any of those methods. Another would be using svg images with text embeded in them set to display at 100% transparency; svg's can even have fonts, so you could hide one there. The possibilities are endless!

Fat Abe
03-29-2010, 06:29 AM
The jpeg route allows one to insert a number of telltales, and that is the most inconspicuous, unless the thief had two copies of the book. That still seems the most preferable for an average person.

I'm not that familiar with the spec on the opf file, but can't one slip in a block like <xyz><item id="abcd1234" /></xyz>, where xyz can be any string other than metadata, manifest and the other reserved names? Or just obfuscate by using a known name type where assignment of the id does not conflict with anything else. id is used to hold the hex value of the registered user.

As we come up with new identity embedding methods, one can be sure the pirates are working on stripping programs to automatically eradicate all ownership tags, including the ISBN.

larkki
03-29-2010, 06:54 AM
Good ideas people! I was thinking about something along these lines as well.
But do you know any commercial or open source solution which would provide epub watermarking? Or companies who are working on such technology? I'm doing my master's thesis on the subject so that's why I'm curious.

DaleDe
03-29-2010, 11:17 AM
Good ideas people! I was thinking about something along these lines as well.
But do you know any commercial or open source solution which would provide epub watermarking? Or companies who are working on such technology? I'm doing my master's thesis on the subject so that's why I'm curious.

One of the problems is that if you tell everyone how you did it then it compromises the security of the method. This is why there is no public domain DRM schemes. Many of the proposed solutions in this thread were open source but they depend on the end user not knowing what the originator did. Obscurity is essential unless you are just putting a real watermark in the document like a background image on every page or including the users name at the top of every page.

Dale

darkmonk
03-29-2010, 08:41 PM
Yes - I want to say, there is no method you could ever do that cannot be removed. The most difficult method would simply be a direct alteration of the zip file, that did not impact any of the source files. It would be an incredible exploit, but good luck finding it.

Fat Abe
03-29-2010, 10:39 PM
One of the problems is that if you tell everyone how you did it then it compromises the security of the method.

Dale

Most of the ideas mentioned in this thread have been tried out already. The second layer of defense in any DRM'ed document are these telltales, which exist in jpeg, opf, html, xml, and css files. I have a theory about why Adobe and Amazon made it so easy for their DRM schemes to be broken. It's because they stuck enough telltales into the files to trace it back to the source recipient, who was then dumb enough to post it on the internet. OK, maybe I'm giving Adobe too much credit. It could be that they really are negligent. The next revisions of ADE and Amazon's Kindle for PC should be very interesting.

In the html files, I'm waiting to see if an author tries adjusting the words between the <p> and </p> paragraph tags, using ordinary line feeds to vary the count of words per line in the ascii file. And to do it "uniquely" per delivered file. Such a change would be totally transparent to any ereader, and might even look natural to someone editing the raw html. Of course, this scheme is easily defeatable.

The OP has been exposed to a number of good ideas by this board, and let's hope he can get back to us in a few months to see if he caught a pirate distributor at work.

Jellby
03-30-2010, 03:48 AM
Yes - I want to say, there is no method you could ever do that cannot be removed. The most difficult method would simply be a direct alteration of the zip file, that did not impact any of the source files. It would be an incredible exploit, but good luck finding it.

I think you can just append extra bytes to a valid zip file, with the zip file remaining valid and the extra bytes not being seen anywhere, unless you look for them with an hex editor or similar. Of course, if someone re-creates the zip (because they have modified the metadata to have "Surname, Firstname" instead of "Firstname Surname", for instance), the bytes are lost.