Friday, 27 January 2006

Court says Google's distribution of "cached" Web pages is "fair use"

In the first legal test of's storage and redistribution of "cached" copies of Web pages -- even if the page has deliberately been removed from the original site, for example because a time-limited publication license has expired -- a Federal District Court in Nevada has ruled that this is a "fair use", and not copyright infringement:

Author (and attorney) Blake Field sued for infringing copyrighted worked published on Field's personal Web site:

The parties do not dispute that Field owns the copyrighted works subject to this action. The parties do dispute whether by allowing access to copyrighted works through "Cached" links Google engages in volitional "copying" or "distribution" under the Copyright Act sufficient to establish a prima facie case for copyright infringement.

Field does not allege that Google committed infringement when its "Googlebot," like an ordinary Internet user, made the initial copies of the Web pages containing his copyrighted works and stores those copies in the Google cache.

Instead, Field alleges that Google directly infringed his copyrights when a Google user clicked on a "Cached" link to the Web pages containing Field's copyrighted works and downloaded a copy of those pages from Google's computers. [citations omitted]

The Court ruled that Google had an "implied license" to store, copy, and redistribute Field work, as a result of Field's failure to take affirmative action to inform Google that he didn't want Google to copy his work!

Field remained silent regarding his unstated desire not to have "Cached" links provided to his Web site, and he intended for Google to rely on this silence. Field could have informed Google not to provide "Cached" links by using a "no archive" meta-tag or by employing certain commands in a robots.txt file. Instead, Field chose to remain silent knowing that Google would automatically interpret that silence as permission to display "Cached" links.

In analyzing this purported "implied consent", the Court claims that "any site owner can disable the cache functionality for any of the pages on its site in a matter of seconds". It's taking me a couple of hours to add the newly-required tag to every page of my Web site through a "noarchive" meta-tag in the HTML. But the Court assumes, wrongly, that (1) copyright in all works on a Web page is owned by the same person (the "noarchive" tag applies equally to an entire page), and (2) that she who owns the copyright controls the HTML, which is almost never true of work licensed for Web publication from freelancers who own the copyrights.

The Court also presumes that it is beneficial for any work to be available from Google's "cache". The possibility that the copyright holder might no longer wish the work to be available -- either because of the expiration of a time-limited Web publication license, or for other reasons of personal choice -- and that its non-availability from the original site might be deliberate, is never considered.

In light of this decision, freelance writers who want to grant time-limited rights to Web publication (so that they can re-sell a work, or charge for renewal or extension of the license) will need to add clauses to all future license agreements requiring the inclusion of a META NAME="ROBOTS" CONTENT="NOARCHIVE" tag in the HTML code of all Web pages on which the work appears. And messy disputes are likely to erupt with respect to requests by writers and other copyright holders to add such headers to work already on the Web, to which they had not intended to grant Google free reproduction rights in perpetuity.

The Court draws an analogy with "time shifting" of television broadcasts as fair use, which seems to suggest that in this Court's mind time-limited Web publication licenses might not be considered enforceable. It's common for a writer, photographer, or artist to authorize publication of an article both in a printed periodical and online for a specified period of time, 30 days or six months for example, after the publication date of the print edition. An untold number of such licenses, or at least their enforceability, have thus been cast into question.

I certainly have never intended the absence of "noarchive" tags on my Web site, or other Web sites on which my work is published, to imply a license to or anyone else to redistribute them. Rather, it has been my understanding (and, in fact, still is my understanding), that copyright law in the USA and most other countries requires an explicit and affirmative grant of license to authorize commercial for-profit reproduction such as Google's "cache". The Court's conclusion that if a Web site doesn't include a "noarchive" tag, the copyright holder(s) (and not the creator of the HTML, who usually isn't the same) "chose to permit such links to be displayed" just isn't true. And I include the following explicit notice in the HTML code of every page on my Web site:

META NAME="COPYRIGHT" CONTENT="copyright Edward Hasbrouck [year], all rights reserved. Mirroring, caching, syndication, and/or archiving of this Web site for purposes of redistribution, or any commercial use including the reproduction of any portion of this Web site on pages including advertising or self-advertising, is expressly forbidden, except with the prior express written permission of the copyright holder. If you received this file from a server outside the domain, you have received an unauthorized and copyright infringing bootleg copy. Please report copyright violations to Edward Hasbrouck,"

I've pointed out repeatedly in the past that Google's distribution of "cached" copies of Web pages is the most unambiguously infringing of Google's activities . I can only hope that this decision, and the false factual assumptions behind it, are overturned.

[Addendum, 28 January 2006: This article has prompted some interesting private correspondence with John Levine , who is the author of several excellent books including The Internet for Dummies and who testified as an expert witness for Google in its defense against Field's lawsuit. It's also being discussed on one of the mailing lists for members of the National Writers Union , in which my attention was called to this 1995 NWU position paper, Authors in the New Information Age: A Working Paper on Electronic Publishing Issues . For those of you just coming across this thread, there's more in my earlier articles in the Writing and Publishing category of this blog.]

If a third party site is distributing your material contrary to terms in your licensing agreement, your qualm is with them, not google. How the hell is google supposed to know about your licensing agreement? Further, if you add the noarchive tag to your page at a later date, your page will be purged from cache on it's next scrape by google. This takes between 24 hours and 14 days, depending on your site traffic.

Posted by: Christinek, 29 September 2007, 23:08 (11:08 PM)
