Quote:
Originally Posted by HarryT
I was under the impression that the goal was to produce a book that's structurally similar to the original, but from which the original couldn't be identified.
Of course, if that's not the desire, then there's no issue.
|
I am pretty sure that it would be programmatically fairly trivial to generate a fingerprint of an original ePub, azw3 file (or whatever) only taking into consideration a books's structure and the length of individual words and paragraphs.
This fingerpringt would be identical for the original work and the derived work, but differentiate any book from any other book. Given a database of these fingerprints, one could easily identify every work in that database.
I don't think that should be the issue here, however. Copyright law does not prohibit
identification of a copyrighted work (otherwise, the sheer mention of "Harry Potter" would constiute copyright infringement). I think the goal should be that the original should not be
recoverable, i.e. that given the scrambled file, it should not be possible to reconstitute the original work (even in parts).
If identification was a problem, posters on MR would be forbidden to mention the name of the copyrighted work when requesting help and uploading a scrambled copy. I hardly think that would accomplish anything or help the copyright owner.
Given that, I honestly don't get the hang-ups on the ISBN number. That number very obviously is not part of the copyrighted work itself in any meaninful way (again: otherwise even the mention of an ISBN number would infringe copyright[1]).
As for algorithmically scrambled books constituting derived works: that stance is so far-fetched it boggles the mind. Like another poster said, you'd be laughed out of court with an argument of that kind.
Consider, for example the following algorithm: Count the characters in a book, multiply them by the number of pages and divide by the number of letters in the title.
You get a number that is algorithmically derived from the original work and will be the same every time you apply that algorithm to a given book. Yet I think we can all agree that this would not be a "derived work" in any meaningful sense of the word. The US Supreme Court has decided on the side of the "spirit of the law" over the "letter of the law" debate in much more contested cases (No. 13–7451, Yates vs. US or the recent Obamacare decision, for example) - a case like this would be so clear-cut it would never even make it to court.
As for the question whethe or not the XHTML structure of the book itself is part of the copyrighted material, ask yourself whethe the type of paper (or font size) used in a printed book is part of the copyrighted material.
Matt
[1] I don't really suppose that anyone's arguing that mentioning the ISBN number is indeed almost copyright infringement but falls under the "fair use" clause of copyright law. The ISBN number's whole point is to make the book identifiable and it belongs to the public record.