The Safeguards Needed for Government Data Mining
By ANITA RAMASASTRY
|Wednesday, Jan. 07, 2004
In December 2003, the Department of Defense's Inspector General (DOD IG) issued a report on the controversial Total Information Awareness Program (TIA). The report concluded that the Defense Advanced Research Projects Agency (DARPA) had not adequately considered the privacy concerns associated with TIA.
TIA itself is now defunct. But the federal government continues to use "data mining" techniques in other federal initiatives such as the Computer Assisted Passenger Profiling System II (CAPPS II). Moreover, as the DOD IG report notes, the federal government is likely to adopt other versions of "data mining" in the future. And meanwhile, at the state level, as I noted in an earlier column, a new data mining initiative, the Matrix, is already occurring.
What all these programs have in common is that they are "data mining" initiatives -- that is, they seek to develop computer technologies to sift through large data repositories to identify threatening patterns and people. (Data mining, in this sense, is the use of existing data to try to predict crime or terrorist activity prospectively.) What these programs also have in common, is that they do not pay sufficient attention to concerns of individual privacy.
Data mining, without adequate privacy safeguards, has the potential to be used as a tool to spy on American citizens without the judicial or procedural constraints that limit how far more traditional surveillance techniques can infringe privacy.
For all these reasons, the DOD IG's report's conclusions remain highly relevant. (So will the views of the congressionally mandated technology and privacy advisory group also examining TIA, and preparing its own report.)
The TIA Program, and the CAPPS II Program
To understand the report's significance, some brief background on the TIA program itself, and on the CAPPS II program, is necessary.
TIA was a project of the DOD's Information Awareness Office, run by former national security adviser John Poindexter. It was based on the theory that possible terrorist threats might be identified by sorting through everyday transactions, such as credit card purchases, car rentals, and travel reservations.
The databases they sought to create would have been extensive -- drawing from the vast amount of data that currently exists in government and commercial (that is, for profit) databases. As I discussed in a previous column, this data -- with a simple search -- could have been easily converted to create an individual dossier of any one of us. Significantly, such a dossier might be created simply because computer models place an individual in a predictive risk category -- not because concrete evidence exists linking that individual to a specific crime or terrorist event.
Data mining is a broad and murky term. Law enforcement often uses data mining or sifting as a legitimate investigative tool. For example, if a blue Toyota truck was used as the getaway car in a robbery, few of us would object to the police running a database search to identify the license plate and registration information for blue Toyota trucks in a given locale.
But what civil liberties advocates do object to is government's accessing sensitive and personal data to predict whether we are likely to be terrorist or criminals - without any probable cause to do so.
By late 2002, word of the TIA program had reached Congress. Senators Charles Grassley, (R-Iowa), Bill Nelson, (D-Fla)., and Chuck Hagel, (R-Neb) raised concerns about TIA. In particular, Senator Grassley sent a letter to the DOD IG, Joseph Shmitz, asking for a review of TIA that would include an assessment of what protections were in place to ensure that civil liberties were not violated. The IG's office subsequently embarked upon the review.
In fiscal year 2003, the TIA research and development effort began. DARPA requested that an estimated $53.8 million in funding for the pilot phase of TIA be included in the President's Fiscal Year 2004 Budget.
However, TIA came under fire due to fears that it would infringe civil liberties. Accordingly, in September 2003, Congress voted to close it down. Thus, the National Defense Appropriations Act for Fiscal Year 2004 eliminated funding for the majority of the TIA program's components.
That was not the end of federally-initiated data mining, however. Also in 2003, the Department of Homeland Security ("DHS") proposed to use the same techniques -- in a program known as CAPPS II -- to identify air travelers who were possible terrorist risks.
Anticipating privacy concerns, in 2003, DHS appointed a Chief Privacy Officer whose job is to ensure "privacy compliance across the organization, including assuring that the technologies sustain, and do not erode, privacy protections relating to the use, collection, and disclosure of personal information."
This was a positive step. However, a privacy assessment of the current, revamped CAPPS II proposal has yet to be issued.
And privacy violations have already occurred: Airline JetBlue shared information about millions of passengers with a government contractor for "national security" reasons.
Now, Congress has wisely blocked deployment of CAPPS II until the GAO studies its privacy implications. The GAO report must be completed by February 15, 2004.
The DOD IG Report on TIA: Its Conclusions
The DOD IG Report acknowledged that no statute had required DARPA to conduct a privacy analysis with respect to TIA. But it also expressed the view that doing so would have constituted a prudent "best practice."
By failing to consider privacy concerns, the report pointed out, DOD "risks spending funds to develop systems that may be neither deployable nor used to their fullest potential without costly revisions and retrofits." It also misses an opportunity
"to minimize the possibility of any Governmental abuse of power."
Accordingly, the report called for DARPA to perform a privacy impact assessment before TIA-type technology research continues. The report also recommended the appointment of a Privacy Ombudsman (or the equivalent) to oversee TIA development, and evaluate these technologies from a privacy perspective.
Laudably, DARPA itself concurred with both recommendations -- and thus, one can assume, it will be undertaking privacy assessment with respect to similar technology in the future.
To do so, as the report pointed out, only makes sense: The alternative is for the federal government to grapple with continuing criticism, face possible privacy-violation lawsuits, and ultimately be forced at great cost to overhaul computer systems to respect privacy. If the government does not address privacy issues now, it inevitably will have to do so later.
What Privacy Impact Assessments on Data Mining Should Include: The Basics
That raises a crucial question: What should a privacy impact assessment for a data mining program -- whether it is CAPPS II, or some future version of TIA -- look like?
It should begin with the basics -- information that, surprisingly, the government has been unwilling to disclose in the context of TIA and CAPPS.
First, information about the databases themselves should be provided. What kind of data will be compiled? And, from what sources? How will the data be merged? Will it include information individuals tend to consider highly private -- such as medical data, data as to children, and financial data?
Second, information about access to the data -- and databases -- should be provided. With whom the data be shared -- and upon what showing, if any? Can state, or only federal, authorities access the data? Which authorities, in particular, will have access? What about foreign governments who are U.S. allies? What about government contractors that are private companies involved in defense or security, or private domestic and foreign airline companies?
Third, how will data sharing limits be enforced? What are the legal and technological guarantees that those in the private sector -- or those in the government who lack authorization to do so -- will not be able to access the data and databases? How can hackers be preventing from accessing the data?
Beyond the Basics of Privacy Protection: Cost-Benefit Analysis and Legal Recourse
After these basics are out of the way, a cost-benefit analysis must begin. Is it truly worthwhile to pay the privacy cost of including very sensitive information -- such as medical data -- or to allow wide access to the databases? Or is the benefit of these facets of the system modest enough, that they should be eliminated or curtailed?
The chance of catching a terrorist because individuals' medical histories are included may be slight, and not worth the privacy cost. Similarly, the chance of catching terrorists as a result of allowing local government officials to access the databases may be slight -- and not worth the risk that employees may try to check up on their neighbors.
A cost-benefit analysis ought to also be made with respect to particular technologies. A search system may be advantageous in certain ways, but also turn up a large number of false positives. Are the false positives -- and resulting wrongful accusations -- too great a cost? Perhaps a different system that would be more accurate, but cast a narrower net, would be preferable.
Once these decisions about the nature of the searching system, the databases, and the data are made, they must be made public. If there are procedures for protecting privacy, they should be public -- so people will know if they have been victimized by a violation.
If the government uses data compiled in the private sector, such data may very well contain errors. The government must consider how errors in its databases --errors that may prevent innocent persons from flying, or have other consequences for them -- will be corrected. A related question is how often the databases will be updated and revised.
Procedures for error correction are needed and should be well-publicized -- so that no one is forced to live with a mistake in his or her data that limits the freedom to travel, or other freedoms. (Imagine being identified as a terrorist, because you have the same name as someone else.) If a citizen is falsely identified as a risk, he or she needs to be able to clear his or her name from any security or watch lists, for example.
Finally, any data-mining program needs to have a redress mechanism. Arguably, Americans must not only be able to correct database errors, but also, if they are harmed by privacy violations, must be able to take specific legal recourse against the government.
Taking Privacy Serious: Positive Signs Exist, But More Must Be Done
The recent DOD IG report is a positive indication that the federal government is taking privacy issues more seriously in the war on terrorism. So is the DHS's appointment of a Chief Privacy Officer.
Both developments are to be applauded. But the government's privacy measures still have not managed to catch up with its technology proposals. The Chief Privacy Officer's CAPPS report is long overdue. And by the time the DOD IG report came out, TIA itself was a moot question.
Privacy review should be in the lead -- not following in the wake of these privacy-threatening programs. Our privacy rights are too valuable to be an afterthought.