Privacy and Search Engine Data: A Recent AOL Research Project Has Perilous Consequences for Subscribers
By ANITA RAMASASTRY
|Monday, Aug. 21, 2006|
Some America Online (AOL) Internet service subscribers may be in for a nasty shock. Approximately 658,000 subscribers had their search queries - numbering a reported 20 million - made public for 10 days in early August, when the data, meant to be shared only with search engine researchers, was mistakenly released. And records of the data still remain accessible on the Internet; while AOL removed the data, it had already been mirrored and cached elsewhere.
This is far from an isolated incident: Reportedly, Excite and AltaVista released smaller amounts of their users' search results approximately five or six years ago. Both used anonymous identifiers, as AOL did.
An AOL spokesperson has been quoted as saying, "there was no personally-identifiable data linked to these accounts." And it's true that AOL did not post names in combination with the searches - just numerical IDs. But based on various news reports and blog commentaries, it has become readily apparent that the search data is enough to identify quite a few individuals.
The upshot: If an individual did searches based on, for instance, their own names, neighborhoods, street addresses, and so on, others may be able to find out what other searches the individual did - even if the information revealed is legally private (a search for information related to a confidential medical condition such as AIDS), intensely embarrassing (for example, a search for porn sites), or even evidence of crime (a reported search for tips on how to commit murder).
Accurate inferences from these searches may damage the searcher; so might false inferences (for instance, suppose the "AIDS" searcher did not suffer from AIDS herself).
Two privacy groups, the Electronic Frontier Foundation (EFF) and the World Privacy Forum (WPF) have filed complaints with the Federal Trade Commission (FTC) calling AOL's actions unfair and deceptive trade practices. It is still too early to tell if they will ultimately prevail, but it's very important that an investigation be conducted. As EFF notes, among the other serious injuries the now-public data may wreak, is the possibility of identity theft. And there's no question that the searches can be connected to individuals: In support of its complaint, EFF confidentially submitted a sampling of AOL search queries containing personally identifiable information and search histories that could likely be tied to particular AOL subscribers.
Congress also should examine this recent customer data leak, because once again, consumer data is making the rounds, without a subscriber's consent. There are still no uniform federal standards, under our laws, for dealing with disclosures of individuals' search records - whether solely by a private company, or in response to a government request or subpoena. That situation ought to change. We need to have clear rules as to what is permitted, and what is forbidden.
What AOL Did: Why Anonymous Identifiers Didn't Prevent Matching Searches to Individuals
AOL had posted the data on its recently launched AOL Research site, aimed at database and search engine researchers. The problem: Anyone could access the site, which was apparently not password-protected. (Since then, a lot of dot-connecting has gone on. The New York Times figured out that user "4417749" -- who searched "homes sold in shadow lake subdivision gwinnett county Georgia," and searched the names of several people with the surname "Arnold" - was 62-year-old Georgia resident Thelma Arnold.
Others poring over the data claim to have discovered over 100 Social Security numbers; dozens - perhaps hundreds -- of credit card numbers; and the full names, addresses and birthdates of various subscribers who entered these terms as part of search queries.
The FTC Complaints Filed by the Electronic Frontier Foundation and the World Privacy Forum
Both the EFF and the WPF have asked the FTC to investigate, and possibly sanction, AOL for its unauthorized release of customer search queries. .
The FTC complaints allege that AOL engaged in unfair or deceptive business practices by exposing its subscribers' information without warning them previously that it might do so. Indeed, according to the EFF complaint. AOL claimed in its privacy statement that it took "reasonable and appropriate" measures to protect personal consumer information from public disclosure. Surely such measures ought to have prevented the public posting of private search data!
AOL should be required to notify every customer whose privacy has been jeopardized by the company's handling of this private information. AOL also should adopt policies for the future by which it either does not cache such data, or quickly purges its caches. EFF and WPF have asked the FTC to require changes in AOL's privacy practices, and these requests, too, are a good idea.
Consequences: It's Time for Clear Laws, with Strong Remedies, Regarding Search Disclosures, Whether or Not at the Government's Behest
While AOL released data to the public of its own accord, there has also been concern, in recent months, about ISPs' turning over search data to government officials in response to subpoenas.
In March, a federal judge rejected efforts by the Department of Justice to use subpoenas to Google to gain access to Google users' search logs. The court, however, held that the Justice Department could have limited access, instead, to Google's index of Web site URLs. (Google was the only search engine to fight the Justice Department on this issue, with Yahoo, Microsoft's MSN and AOL handing over their users' search data, rather than litigating the point.)
Hopefully, AOL's violation of its users' privacy will spur Congress to clarify when and how search log information should be protected. As EFF notes, a few states require that consumers be notified in the event of a security breach. In this case, it is unclear whether such laws would apply to the AOL search data scenario - but if they don't, they ought to. And in any case, federal-law protection is needed so that all Americans can use the Internet with confidence, and without fear their privacy will be invaded.