Two Faces of Intelligence Failure: September 11 and Iraq’s Missing WMD

Richard K Betts. Political Science Quarterly (Academy of Political Science). Winter 2007/2008, Volume 122, Issue 4.

Al Qaeda’s surprise attacks on the World Trade Center and the Pentagon were the U.S.’s second Pearl Harbor. The shocks jolted Americans out of the complacency about national security that they had enjoyed during the dozen years after the Cold War and launched them into a worldwide war against terrorists. When this shock was followed by the failure to find weapons of mass destruction (WMD) after the invasion of Iraq, recriminations over intelligence failure provoked the most radical reorganization of the intelligence system since the post-Pearl Harbor National Security Act. In one case, the intelligence community failed to provide enough warning; in the other, it failed by providing too much. The discussions that follow are not complete case studies of the two failures; lengthy examinations are available in definitive official investigations, as well as in the memoirs of the Director of Central Intelligence (DCI), George Tenet.

This article uses the main points in the two most prominent failures of recent times to illustrate the barriers to success and the dilemmas that bedevil intelligence in many other cases. Both of these cases were dramas more complex than is generally understood, and included much good intelligence work despite the failures at the bottom line. In both cases, hindsight reveals mistakes that could have been avoided, but also mistakes that were tragic yet natural.

Critics often assume that because the United States spends huge amounts on intelligence (now probably around $50 billion annually), failures must be due to negligence, treachery, or stupidity. All of these have played a role at some point, but not often. The fundamental problem is that strategic intelligence is not a fight against nature, where the odds of success rise in direct proportion to the attention, resources, and effort applied to improving protection, and in which overwhelming effort may make the odds of disaster negligible. Rather, it is a fight against cunning outside enemies looking for ways to circumvent our efforts. The existence of other political interests that sometimes conflict with intelligence collection (for example, constitutional rights to privacy) may also limit the results of even massive efforts, making good things, paradoxically, innocent enemies of intelligence. The complexity of modern government creates confusion in communication and coordination and tradeoffs among objectives that conflict with each other, leaving a large roster of impersonal enemies of effective intelligence that are inherent in the process. American intelligence often succeeds very well—and the successes get far less attention and are harder to recognize than dramatic failures—but some incidence of failure is inevitable.

The two cases compared here reflect the ample roles of all the enemies of intelligence. Most of all, the outside enemies—al Qaeda and Saddam Hussein’s government—concealed their capabilities and strategic intentions and misled the American intelligence system. Their deception was compounded by the innocent and inherent enemies within the U.S. system. At a few points, law enforcement officials’ concerns about legal backing held back maximally intrusive domestic intelligence collection, or political officials with strong views tried to bend the interpretation of data to those views. And as always, problems of bureaucratic confusion, breakdown in communication, analysts’ and policymakers’ cognitive processes, and difficult choices between competing risks got in the way of the right actions.

Limits of Warning: 11 September 2001

Conventional wisdom after the strikes on the World Trade Center and the Pentagon was that U.S. intelligence had failed egregiously. One CIA retiree described the failure as worse than Pearl Harbor, because in 1941, we “did not have a director of central intelligence or 13 intelligence agencies or a combined intelligence budget of more than $30bn to provide early warning of enemy attack.” True. Yet before September 11, as before Pearl Harbor, the U.S. intelligence system succeeded in providing timely warning of a sort, and as before Pearl Harbor, the sort of strategic warning that was given proved absolutely useless. In the weeks beforehand, the system detected arid trumpeted a raft of indications that a major attack was imminent. The system could not get beyond that stage to the level of tactical warning, however, because it did not uncover or link specific information that might have made it possible to intercept the strikes. The system warned clearly about whether an attack was coming soon, but could not say where, how, or exactly when. The warning was voluminous but too vague to be “actionable.”

In the mission to provide usable warning, performance before September 11 failed in all phases of the intelligence cycle. The system failed in collection, since it did not discover the particular perpetrators, plans, or means of attack that lay beneath the “chatter” in signals intelligence that indicated an imminent action. It failed in processing and dissemination, as some pieces of information were not correlated in ways that would have raised the odds of identifying individuals involved in the plot or the instruments they planned to use. It failed in analysis, by not finding the right pattern by which to “connect the dots” in the array of clues that was both incomplete and full of clutter. And policymakers also failed. The administration of George W. Bush had not made terrorism as high a priority as had either the intelligence community or the administration of Bill Clinton: the position of National Coordinator for Counterterrorism on the National Security Council (NSC) staff was downgraded, the Deputies Committee (one notch below the top level of interagency coordination) did not meet to discuss counterterrorism until three months into the administration, and the Principals Committee (the top level) met for the first time on the subject four months after that, only one week before the attacks occurred.

Errors of Omission

Before the bombings of the U.S. embassies in Kenya and Tanzania in 1998, the intelligence community was slow to focus on al Qaeda. A 1995 National Intelligence Estimate (NIE) on terrorism and a 1997 update barely mentioned Osama bin Laden. After 1999, the Counter-Terrorism Center of the Director of Central Intelligence had some success in penetrating al Qaeda, but the agents were not at a high level in the organization.

Throughout 2001, nonetheless, attention, collection, and warnings of action by al Qaeda escalated precipitously. There were more than 40 articles relating to Bin Laden in the president’s daily brief in the first eight months of the year, and in the spring, reporting on terrorist threats reached the highest level since the alert before the turn of the millennium. In May, there were several reports of attacks planned within the United States, and in June and July, other reports poured in, although they indicated action in the Middle East or Rome. At the end of June, the official on the staff of the NSC who was in charge of counterterrorism warned the Assistant to the President for National Security Affairs, Condoleezza Rice, that indications of attack planning “had reached a crescendo.” Most attention focused on probable strikes abroad, but several times in mid-2001. President Bush asked “whether any of the threats pointed to the United States.” The CIA answered on 6 August with an article in the president’s daily brief titled “Bin Ladin Determined to Strike in US.” As DCI Tenet later described the situation in that summer, “the system was blinking red,” and he was running around Washington with his hair on fire. Two months before September 11, an official briefing said that Bin Laden “will launch a significant terrorist attack against U.S. and/or Israeli interests in the coming weeks. The attack will be spectacular and designed to inflict mass casualties.” These warnings remained limited by the problem that, as Tenet had warned Congress the previous spring, “We will generally not have specific time and place warning of terrorist attacks.”

Technical intelligence collection through photographic reconnaissance could not uncover al Qaeda attack preparations as it could have those for a conventional military assault, which requires mobilization, loading, and movement of large forces. Signals intelligence, however, detected many communications, and an upsurge of “chatter” among suspected terrorists was much of the reason for the intense sense of urgency during the summer. On the day before the attacks, the National Security Agency (NSA) intercepted messages “in which suspected terrorists said, ‘The match is about to begin’ and ‘Tomorrow is zero hour.'” The messages were not translated until 12 September, but they would not have in themselves made a crucial difference anyway, because they did not mention details of where a strike might occur, that airplanes would be used, or what the targets might be. The director of NSA at the time. Lieutenant General Michael Hayden, “also noted that more than 30 similar cryptic warnings or declarations had been intercepted in the months before 9/11 and were not followed by any terrorist attack.”

The “cry wolf syndrome” is a chronic problem for effective intelligence services; excellent collection gets information about threatening indicators, but when an attack does not occur immediately, the effect is to dull receptivity in the future.” Or multiple warnings may set off a raft of expensive defensive counteractions that exhaust the system, even if one or two of the multiple warnings does yield an actual incident. In December of 1999, for example, American intelligence told President Clinton that “Usama bin Ladin was planning between five and fifteen attacks around the world during the millennium and that some of these might be inside the United States. This set off a frenzy of activity. CIA launched operations in fifty-five countries against thirty-eight separate targets.” A year later. Tenet warned of indicators that “the next several weeks will bring an increased risk of attacks,” but they did not happen. After September 11, the system was deluged with indicators of attacks that did not come off.

If the collection of strong but nonspecific warnings before September 11 could have been supplemented by detection and tracking of the plotters, the odds of blocking the attack would have gone up. The discovery that might have been most telling was shunted aside. Two months before the attacks, an FBI agent who had noticed the “inordinate number of individuals of investigative interest” attending flight schools in Arizona wrote the “Phoenix memo,” which warned that Bin Laden might be orchestrating a project, and recommended closer investigation of the situation at the flight schools. The memo was sent to one field office. Although it also went to the units at FBI headquarters that were focused on Bin Laden and radical fundamentalism, investigators there did not see it until after September 11. Intelligence discovered some travel patterns of two of the future hijackers, Khlaid al-Midhar and Nawaf al-Hazmi, but organizational boundaries and differences in procedure prevented piecing together some of the meetings of the two and handing off the job of monitoring from the CIA to the FBI when they entered the United States from Southeast Asia. The CIA took too long to put Khalid al-Midhar on the State Department watch list of suspected terrorists and did not notify the FBI that he had gotten a visa allowing repeated travel to the United States. FBI did not give CIA the Phoenix memo, and the CIA did not effectively publicize al-Midhar’s and al-Hazmi’s connections to al Qaeda, which might have alerted other agencies to check on connections to flight schools.” Some of the crucial breakdowns occurred not because of rules or because of decisions not to share intelligence, but because of accidents. At one point, instructions were issued at CIA to transmit crucial information about al-Midhar to the FBI, and messages mistakenly indicated that it had been sent when it had not. In another instance, CIA officers in the field sent important information about al-Hazmi to Washington, but put it at the end of a cable that contained routine information and was mistakenly not marked for “action.”

The FBI fell down the most. In its regular priorities, the Bureau focused overwhelmingly on investigating for criminal prosecution rather than for general intelligence gathering, and confusion about legal requirements blocked sharing of information between those involved in the two missions. The FBI failed to mount a full investigation of Zacarias Moussaoui, a student at a Minnesota flight school. Agents found that he had jihadist beliefs, a large amount of money in the bank, and a suspicious record of travel in and around Pakistan. Reports from the French government provided evidence of his association with Chechen rebels. Nevertheless, FBI agents did not obtain a warrant to conduct a search of Moussaoui’s computer because they could not find “probable cause” sufficient to meet what they thought were required legal standards—an understanding of legal limitations that was later revealed to be incorrect. FBI headquarters also believed that Minneapolis agents were exaggerating the danger posed by Moussaoui. The chief of the Bureau’s international terrorism section did not even know that agent Harry Samit had sent a report three weeks before September 11 warning that Moussaoui might be involved in a hijacking plot. An unlimited effort to investigate Moussaoui might have uncovered his ties to the other plotters and forced immediate concentration on the potential for an attack involving airliners.

The intelligence community’s organization was not optimized for coordinating efforts on counterterrorism. For example, the NSA analyzed communications among several suspected terrorists who they thought might be up to something, but did not attempt to establish the identities of the individuals in detail. The NSA assumed that it was supposed to respond to requests from consumers or analytical agencies such as the CIA. If the identities had been researched, more “dots” might have been accurately connected. (And of course they might not have been.) Overall management of the community did not designate a single point of responsibility for coordinating efforts; instead, all involved were assumed to have the responsibility, but the field units more than others. As a result, in the view of the National Commission on Terrorist Attacks Upon the United States (the 9/11 Commission), CIA “headquarters never really took responsibility for the successful management of this case. ”

Mistaken Priorities or Impossible Choices?

Did the intelligence system fail to make counterterrorism a high enough priority before September 11? Yes and no, but mainly no. Relative to other foreign policy issues, the emphasis on counterterrorism went up significantly after the Cold War. More and better measures could have been taken to interdict or defend against the attacks, but reasons to bear all of the costs—both absolute expense and opportunity costs—are clearer in hindsight than they could have been before the fact.

A superpower will always have more than one major potential threat to its interests, and it will seldom have clear grounds for concentrating singlemindedly on one. Lack of focus or diffusion of effort accounts for some failures in hindsight, but before the fact, too much focus on one priority increases vulnerability to threats that are second or third on the list. Resources are always limited, and when there are numerous claims, even those with high priority get fewer than they might profitably use. To maximize protection against a potential threat—that is, to do everything possible to prevent or defend against it—will seem prohibitively expensive as long as the threat is potential rather than both certain and immediate. If the advent of Hurricane Katrina had been assumed to be certain, for example, the costs of a stronger levee system around New Orleans would have been borne. If the political system decides not to undertake costly defensive measures in response to ample but imperfect warning, the failure is one of policy at least as much as one of intelligence.

The Silberman-Robb WMD Commission concluded that the problem before September 11 was “dispersal of effort on too many priorities,” and the joint congressional investigation concluded that “for much of the Intelligence Community everything became a priority since its customers in the U.S. Government wanted to know everything about everything all the time.” For example, NSA had 1,500 formal requirements covering “virtually every situation and target.” This was hardly surprising at the turn of the century, however, when post-Cold War foreign policy chose to engage the United States actively in most of the problems of the world, from the Balkans to the Arab-Israeli conflict, to rogue states, to humanitarian crises in Africa, to reform in Russia and China, and so on.

Nevertheless, terrorism was near the top of the U.S. government’s list of priorities. In 1995, Presidential Decision Directive 35, the guidance on priorities for the intelligence community, included terrorism among the top few. Between the end of the Cold War and September 11, the aggregate intelligence budget fell, but funding for counterterrorism grew—in most agencies, it doubled. These trends do not suggest that insufficient concern over terrorism caused the failure.

In December of 1998, DCI George Tenet issued a directive saying, “We are at war. I want no resources or people spared in this effort, either inside CIA or the Community.” The directive had little effect, however, in mobilizing new efforts in agencies beyond the CIA. The system took the first step in identifying the priority of counterterrorism in principle. It took part of the second step: giving that priority a larger share of available intelligence resources in the early and mid-1990s. It did not succeed in the third step: inducing policymakers to respond fully and make costly choices to hedge against a potential threat of the sort that exploded on September 11. For example. Tenet’s December 1998 directive was not addressed to agencies beyond CIA and the deputy director for community management. The faltering in the third step is especially reflected in reactions to recommendations to tighten airline security procedures.

In 1995, the NIE on terrorism “highlighted civil aviation as a vulnerable and attractive target.” But when the Federal Aviation Administration (FAA) arranged a briefing for senior figures in the aviation industry by the chief analyst at the Counter-Terrorism Center and his FBI counterpart, the warning failed to convince the industry to pay the price of expensive new security measures. In 1996, in response to the mysterious crash of TWA Flight 800, Vice President Al Gore led a commission on aviation security. The commission concentrated on the danger of bombs placed aboard aircraft and criticized the laxness of existing procedures for screening passengers. In the next few years, various threat reports noted possible uses of aircraft loaded with explosives. Still no move was made to overhaul the airline security system. The FAA’s intelligence unit considered the possibility of suicide hijacking but wrote it off because it would not serve what they thought would be the aim of hijackers—to get hostages in order to negotiate the release of imprisoned Islamic radicals.

Inaction is not an unusual response to warnings of dire threats that are plausible yet uncertain. Before September 11, the Federal Emergency Management Agency identified the three most probable disasters: a terrorist attack in New York, an earthquake in San Francisco, and an extreme hurricane in New Orleans. By 2005, two of the three had come to pass. In none of the three cases did government or other major organizations undertake maximum efforts to prevent or prepare to mitigate the consequences. If Louisiana and the federal government had undertaken every project recommended to cope with New Orleans’s vulnerability, the price tag would have been an estimated $14 billion. While the future costs of doing less than the maximum possible to deal with these future threats were uncertain, the immediate costs to other interests of doing everything possible were certain.

Collection and Connection

The main failure before September 11 was insufficient collection of unambiguous information. Dots must be collected before they can be connected. The more dots there are, the more likely it is that two or three will show directly when, where, or how an assault might come. As Roberta Wohlstetter made so tragically clear, however, increases in the amount of information available can create “noise” that obscures the most meaningful data. Most ambiguous warnings turn out to be erroneous, and paying full attention to every one produced by a system that excels in collection could bring the system to a halt in processing and evaluation. For example, “by September 2001 the F. A. A. was receiving some 200 pieces a day of intelligence from other agencies about possible threats, and it had opened more than 1,200 files to track possible threats.”

Few observers want to admit the trade-off between maximizing collection and losing focus. The Robb-Silberman Commission, for example, complained that “channels conveying terrorism intelligence are clogged with trivia,” in part because bureaucrats pass all information on to avoid “later accusations that data was not taken seriously. As one official complained, this behavior is … ‘preparing for the next 9/11 Commission instead of preparing for the next 9/11.’ But can we have it both ways? Can the handling of information be streamlined for manageability without risking failure to convey the critical “dots” that analysts or policymakers might connect?

Maximizing collection can have catastrophic side effects. Paradoxically, the purpose of intelligence is to protect against disaster, but the lust for intelligence can cause disaster. Risky collection ventures can produce provocations or accidents with major diplomatic and military reverberations. This has happened numerous times. In 1960, the Soviet downing of Gary Powers’s U-2 spyplane led to cancellation of a summit meeting between Dwight Eisenhower and Nikita Khrushchev; in 1967, the U.S.S. Liberty was destroyed and dozens of its crew killed by Israeli aircraft while the ship was collecting signals intelligence during the Six Day War; the following year, the U.S.S. Pueblo, on a similar mission, was captured by North Korea, and its crew was imprisoned for a year; in 1969, an EC-121 collecting electronic intelligence was shot down off the coast of North Korea; in 2001, an American EP-3 aircraft was knocked down by a Chinese pilot, leading to a tense and prolonged international incident between Washington and Beijing. Exploitation of human sources poses similar trade-offs. If a spy is directed to take risks to get all information possible, rather than lying low and waiting for a highpriority task, she may be caught, and then the source is lost and unavailable when needed most. In short, maximizing collection cannot automatically be assumed to be a benefit; when things go wrong, it can cause far more damage to national security than it averts.

If more dots had been collected before September 11, the odds are higher that they might have been connected in a manner that provided usable warning. Yet the more dots there are, the more ways they can be connected—and which way is correct may become evident only when it is too late, when disaster clarifies which indicators were the salient ones. Analysis failed before September 11, perhaps for the reason cited by the 9/11 Commission—ignoring certain methods that had been developed to facilitate warning. One need not make excuses for various failures, however, to believe that laymen may come to expect too much of intelligence—”like expecting the FBI to stop bank robberies before they occur.” A different reading of the record led Richard Posner to conclude, in opposition to the 9/11 Commission, that the answer is “something different, banal and deeply disturbing: that it is almost impossible to take effective action to prevent something that hasn’t occurred previously.”

The special difficulty of tactical warning makes it particularly important for policymakers to consider what they should do with strategic warning. If tactical warning that will prevent attacks from being launched cannot be expected, the premium on measures to blunt their impact goes up. This means translating the warning of potential threats into programs for coping with them when they burst forth. But is there reason for confidence that the lesson of September 11 will make this happen better than in the past? A number of severe potential threats were clearly identified in recent years, ones for which a number of defensive measures have been available. As with aviation security before September 11, however, the costs or negative side effects of some of these defensive options led authorities to decide against them, and to wait for development of alternative measures that would pose fewer costs. As long as the threats do not eventuate, these choices to do less than the maximum possible as soon as possible will seem prudent. The day after one of the threats does eventuate, however, few in the public will forgive authorities for not having made the hard choice to pay the costs and accept the side effects.

One example is the potential for terrorists to mount coordinated strikes on civilian airliners with shoulder-fired anti-aircraft missiles (MANPADS, or man-portable air defense systems). This risk has been understood at least since the mid-1990s. Precautions of various sorts were heightened, but the main defensive option—installation of flare or laser systems on airliners to deflect or destroy missiles in flight—was not adopted immediately. The cost would not only have been many billions of dollars, but the risk as well that false alarms would cause dangerous side effects, such as fires started by flares, or people on the ground being blinded by lasers. So the government decided to await technological development of safer and more economical defense systems. By the time these lines are read, new and better defensive countermeasures may have been put in place, but that will only mean that the gamble to leave this vulnerability incompletely covered for a decade or more paid off, not that the risk in the interim was insignificant. Another example is the efficient dissemination of aerosolized anthrax over several cities. Timely public health response might minimize fatalities through distribution of antibiotics, but deficiencies in stockpiles or procedures for contacting and treating millions of exposed people in untested situations might still yield thousands of fatalities. Yet the government has not mounted a crash program to overcome the obstacles to mass vaccination against anthrax—for many good reasons, such as problems in production of vaccine, limitations on its efficacy over time, or risk to the health of some fraction of those vaccinated. For however long there are no effective anthrax attacks, the choice not to promote mass vaccination seems wise, but if the scenario of successful attack plays out, the choice will seem as mistaken as the failure to beef up airline security measures before September 11. The list of hard choices about which strategic warnings warrant action is long.

Seeing the failure to go far enough in exploring the potential for the kamikaze hijacking tactic ultimately used on September 11, and the failure of intelligence organizations to connect the dots more creatively, the 9/11 Commission opined that it is “crucial to find a way of routinizing, even bureaucratizing, the exercise of imagination.” This is an oxymoronic notion, but on the mark. It can be attempted in various ways, such as instituting “Red Teams,” devil’s advocates, analytical kibitzers, or other mechanisms for thinking “outside the box.” Multiplying the number of scenarios given serious rather than cursory attention, however, runs up against the need for focus on priorities and probable threats. Such innovations prove hard to sustain in practice and rarely provide the remedies anticipated, but periodic revival of the attempts at least focuses attention on challenging assumptions for some span of time.

Wrong For the Right Reason? WMD in Iraq

Having failed to connect the dots before September 11, American intelligence made the opposite mistake on Iraq—it connected the dots too well. In two ways, the mistaken estimate that Iraq maintained stocks of chemical and biological weapons and an active program to acquire nuclear weapons was the worst intelligence failure since the founding of the modern intelligence community.

The less damaging effect was the spillover that tarnished the credibility of U.S. intelligence in general. When U.S. armed forces invaded Iraq but did not find the weapons so confidently attributed to Saddam Hussein by U.S. intelligence, the shock struck many in the public as evidence of fundamental incompetence or chicanery in the intelligence system.

The failure distracted attention completely from the creditable performance of the intelligence community on other issues, including Iraq. For example, a post-mortem led by Richard Kerr, retired head of the CIA’s Directorate of Intelligence, concluded that pre-war intelligence on many issues concerning Iraq was quite accurate—for example, on predictions about “how the war would develop and how Iraqi forces would or would not fight,” connections between Iraq and al Qaeda, “the impact of the war on oil markets,” and “reactions of the ethnic and tribal factions in Iraq.” Most relevant was anticipation of an awful aftermath to war: “assessments on post-Saddam issues were particularly insightful.” Ironically, policymakers heeded technical intelligence about weaponry, which was wrong, “but apparently paid little attention to intelligence on cultural and political issues (post-Saddam Iraq), where the analysis was right.” In a September 2002 Camp David meeting of top national security officials to discuss going to war, a CIA paper titled “The Perfect Storm: Planning for Negative Consequences of Invading Iraq” was included in the briefing books. The summary flagged the possibility of post-war anarchy, fragmentation of the country, instability in other Arab states, disruption of oil supplies, diplomatic conflict with European allies, and a spurt of Islamic terrorism around the world. Yet ultimately, the administration “went to war without requesting … any strategic-level intelligence assessments on any aspect of Iraq.

The worst effect of the estimate was that it provided the warrant for war against Iraq, an unnecessary war that cost far more in blood and treasure than had the attacks of September 11. If we are to believe President Bush, mistaken intelligence did not cause his decision for war, because he had other reasons for wanting to destroy the Saddam Hussein regime. Bush later claimed that he would have launched the war even if he had known that Iraq did not have WMD. The presumed existence of such weapons, however, was the only reason that the administration could secure public support to make the war politically feasible. Had Bush presented the case for war in 2002 as he did a few years later, denying that neutralizing WMD was a necessary condition, no one but fanatics would have lined up behind him. To this extent, the intelligence failure bears responsibility for the war.

At the same time, it is fair to say that the intelligence failure was tragic but not egregious. It was a failure in both collection and analysis. Although the bottom-line analytic conclusion was wrong and the caveats were insufficient, in the absence of adequate collection, it was the proper estimate to make from the evidence then available. No responsible analyst could have concluded in 2002 that Iraq did not have concealed stocks of chemical and biological weapons. The principal mistakes were in the confident presentation of the analysis, and the failure to make clear how weak the direct evidence was for reaching any conclusion and how much the conclusion depended on logic and deduction from behavior. In effect, available intelligence might have served to convict Saddam Hussein of holding WMD if the standard of civil law were applied (requiring decision on the basis of the preponderance of evidence) but could not have convicted him under the standard of criminal law (which requires proof beyond a reasonable doubt).

Roots of Error

Bob Woodward famously reported that DCI George Tenet, when questioned at a White House meeting about how solid was the intelligence that indicated Iraq had WMD, said, “Don’t worry, it’s a slam dunk! (Tenet admits saying those words, but says that the context and connotations of the utterance were misrepresented.)” How could the intelligence community have been so confident about a conclusion that turned out to be so wrong, especially when it had hardly any direct evidence of the existence of the weapons? The essential reason is that the conclusion was deduced from Iraqi behavior and the motives assumed to be consistent with that behavior. To people paying attention to the issue, the conclusion seemed utterly obvious, based on the accumulated observations and experience of the preceding decade. Indeed, “apparently all intelligence services in all countries and most private analysts came to roughly the same conclusions.” This nearly universal consensus was rooted in the experiences following the 1991 war.

After the first Persian Gulf War, the United Nations Special Commission (UNSCOM) uncovered in Iraq a huge infrastructure of facilities and programs for producing nuclear, chemical, and biological weapons that had been hidden from pre-war Western intelligence. Forced by the surrender agreement at the end of the war to allow continuing intrusion by UNSCOM, and caught short when early inspections revealed prohibited activities, the Iraqis fought back. An Iraqi government committee gave instructions to conceal WMD activities from inspectors (this was revealed by a document that the inspectors obtained). Another document retrieved from a nuclear installation showed “how it carried out this order. According to UNSCOM’s final report, ‘The facility was instructed to remove evidence of the true activities at the facility, evacuate documents to hide sites, make physical alterations to the site to hide its true purpose, develop cover stories, and conduct mock inspections to prepare for UN inspectors.'”

From 1992 to 1998, when Saddam finally compelled UNSCOM to leave the country, the Commission was regularly frustrated in its inspections by a game of cat and mouse, Iraqi delays and obstructions that seemed consistent only with attempts to conceal activities they did not want discovered. Iraq admitted having stocks of chemical and biological weapons at the end of the war and claimed to have destroyed them later, but never provided a credible accounting or evidence of such destruction. Since it seemed obvious that it was in Saddam’s interest to demonstrate compliance with legal obligations if it were possible to do so, this failure to account seemed necessarily to indicate that the stocks had been retained and hidden. UNSCOM also got figures on imports of equipment appropriate for WMD programs and could not get the Iraqis to account for what had happened to such materials. This too appeared to confirm that they must be up to no good. As the Silberman-Robb Commission concluded, “When someone acts like he is hiding something, it is hard to entertain the conclusion that he really has nothing to hide.” Moreover, the shock of discovering how much had been successfully concealed before the 1991 war had convinced Western intelligence that the Iraqis were masters of deception, so absence of evidence of WMD in subsequent years, or any negative indications, were explained away as due to denial and deception (D&D). Assumptions that Iraq had ambitious WMD projects and a major D&D program “were tied together into a self-reinforcing premise that explained away the lack of strong evidence of either.

The assessment process reflected errors in method, but ones common among analysts of any sort. It is well known from cognitive psychology that people tend to look for information that confirms what they already believe and discount information that is inconsistent with those predispositions. Instructions to collectors compounded this tendency by telling them to “seek information about Iraq’s progress toward obtaining WMD,” rather than about whether Iraq was trying to get WMD. This may have led agents to “ignore reports of lack of activity.” Similarly, one major negative report from an important source was downplayed. Saddam Hussein’s son-in-law, Hussein Kamel, defected and told his debriefers much about Iraq’s WMD programs, but said that they did not amount to much, and that old stocks had been destroyed. The apparent lack of interest in this aspect of the defector’s testimony may be the most damning example of ignoring negative evidence.

One of the principal disputes about evidence concerned Iraq’s illegal importation of aluminum tubes. The CIA, the Defense Intelligence Agency (DIA), the NSA, and the National Geo-Spatial Intelligence Agency concluded that the tubes were for use in centrifuges, to produce enriched uranium for nuclear weapons. The Department of Energy (DOE) disagreed. Ironically, CIA and DIA would not have firmly asserted that the nuclear program was being reconstituted if they had not had the apparent evidence of the tubes, but DOE agreed on the conclusion of reconstitution despite writing off the tubes.

In the context of seemingly obvious guilt, “analysts shifted the burden of proof, requiring evidence that Iraq did not have WMD,” and in effect “erected a theory that almost could not be disproved.” The October 2002 NIE was rushed to completion in an extraordinarily short time because it had been requested by the Senate Select Committee on Intelligence in the period that appeared to be the countdown toward war. Because of the time pressure, the National Intelligence Council (NIC) did not circulate the draft for peer review or comment by outsiders. This did not seem to be a risky omission, however, because as the vice chair of the NIC said, “I think all you could have called in is an amen chorus on this thing, because there was nobody out there with different views. ”

However obvious the answer seemed, the fact remained that the intelligence community had virtually no hard evidence that Iraq was retaining the chemical and biological weapons that it had at the time of the 1991 Persian Gulf War, had manufactured new ones, or was reassembling its nuclear weapons program. Collection failed in two crucial ways. First, it failed to uncover much new information after UNSCOM inspectors left Iraq in 1998. Second, much of the information it did get came from defector reports that turned out to be fabricated or unreliable. The procedure for protecting sources also misled analysts into thinking that the number of human sources was greater than it was, because clandestine reporting often identified the same source in different ways.

The Senate Intelligence Committee investigation stated that American intelligence did not have a single HUMINT [human intelligence] source collecting against Iraq’s weapons of mass destruction programs in Iraq after 1998,” although there were sources on other subjects, such as political developments. (Tenet later reported that he was especially influenced “by a very sensitive, highly placed source in Iraq” not previously discussed in public.) Technical intelligence collection existed, but errors resulted from “overreliance on dubious imagery … breakdowns in communication between collectors and analysts,” and inadequate signals intelligence. Until late 2002, shortly before the war was launched. North Korean and Iranian weapons programs had a higher priority than Iraq’s; what technical collection there was in Iraq focused on the air defense system, because of U.S. air operations over the southern part of the country.

The biggest mistakes were reliance on unreliable human intelligence, and failure to correct disseminated reports when they were found to be dubious. On chemical weapons, none of the reports from human sources “was considered ‘highly rehable’ … and only six were deemed ‘moderately reliable.'” Most notorious was “Curveball,” the code name for a chemical engineer from Baghdad who had emigrated to Germany and whose reports on biological weapons (BW) development were funneled through the German intelligence service. In May of 2004, a full year after the invasion of Iraq, the CIA determined that Curveball’s reporting was fabricated. When the NIE was being done, Curveball’s allegations about BW programs appeared to be corroborated by three other sources, but one later recanted, and another had already been branded a fabricator by the Defense Intelligence Agency in May of 2002. Nevertheless, owing to bureaucratic miscommunication, allegations about biological weapons programs from that source still found their way into the October 2002 NIE and Secretary of State Colin Powell’s February 2003 speech before the UN Security Council.

These mistakes in validating collection were egregious, but apparently not the fault of the analysts who produced the NIE. Given the thin amount of direct evidence, even erroneous evidence, judgments remained driven by the circumstantial evidence of Iraqi behavior and logical deductions from it. Only when no WMD turned up after the invasion did hindsight make it easy to see other explanations for Iraqi deception. For example, Iraq had frequently tried to import dual-use materials (items that could be apphed either to innocent or to forbidden uses) through illicit channels such as front companies. Analysts assumed this meant that the materials were going to WMD programs, since there appeared to be no other reason to hide the transactions. Iraq did this as standard operating procedure even for some legitimate imports, however, precisely because the UN sanctions monitors sometimes denied permission for innocent items because the materials could be used for WMD. The UN bureaucracy for approving imports was also ponderous and required more time and effort to use than illegal channels, and working through front companies facilitated corrupt skimming of profits.

Hindsight also made it easier to entertain rationales for Saddam’s encouraging his opponents to believe in the nonexistent weapons. In the new light, his strategy appeared to be an attempt to have his cake and eat it too—to claim the high ground in the court of world opinion by asserting compliance with disarmament obligations, while exercising deterrence against the United States and Iran by abetting the inference that he still had the forbidden weapons. FBI interrogation of Saddam after his capture suggested this rationale, and that Saddam was particularly worried that inspections would “expose Iraq’s vulnerability in comparison with Iran.” Saddam even deceived his own government, suggesting to high officials that Iraq had WMD.

The most fundamental obstacle to success in the estimate was that “it is particularly difficult for analysts to get it right when the truth is implausible.” Hindsight always reminds us not to assume that what appears irrational to observers does not have a rationale. When estimating, however, what analyst will ever predict that the subject will act stupidly rather than sensibly? In 1962, the NIE was wrong about missiles in Cuba because analysts did not believe that Khrushchev would shoot himself in the foot. Forty years later, the NIE on Iraq made the same mistake about Saddam Hussein. In October of 2002, attributing to Saddam the strategy of pretending to have WMD would have seemed too clever by half, since it was assumed that he would see his survival as more threatened by non-compliance with legal obligations. The correct but counterintuitive rationale for Saddam’s behavior might have been included in a pre-war estimate, but only as an alternative interpretation to the consensus judgment. Before the fact, it would inevitably have seemed to be an imaginative stretch made by a devil’s advocate fulfilling the requirement to think out of the box—and it would have been dismissed.

What the 2002 NIE Did, Should Have Done, and Could Have Done

The full text of the October 2002 NIE included caveats about the limits of the evidence on which it was based, as well as extensive discussion of the reasons that the State Department’s Bureau of Intelligence and Research (INR) disagreed with the conclusion. (INR’s dissent is sometimes wrongly characterized as a judgment that Saddam did not harbor WMD. INR simply remained agnostic, neither endorsing nor opposing that conclusion.) The dissenting views were highlighted in color and boxed text, not buried in footnotes as was the norm during the Cold War, and the dissents in this case were, as Tenet noted, “an unprecedented sixteen pages of the ninety-page NIE.” Tenet also pointed out that “the phrase ‘we do not know’ appears some thirty times across ninety pages. The words ‘we know’ appear in only three instances. Unfortunately, we were not as cautious in the ‘Key Judgments.’

The tone of the NIE was confident, and the Key Judgments—the summary of conclusions that is all that many consumers read—did not convey the limitations with sufficient force. Apart from directing readers to see the two long paragraphs summarizing INR’s alternative view at the end, and stating, “We lack specific information on many key aspects of Iraq’s WMD programs,” the Key Judgments mainly enumerated estimated Iraqi programs and capabilities, leaving the impression that the estimates derived from observed activities as much as deduction from behavior and assumed intentions. As Sherman Kent recalled, the same thing had happened in the Cuban crisis estimate: “How could we have misjudged? The short answer is that, lacking the direct evidence, we went to the next best thing, namely information which might indicate the true course of developments.”

With the benefit of hindsight, one might argue that the strictly correct estimate in 2002 should have been that the intelligence community simply did not know whether Iraq retained WMD or programs to obtain WMD. That would have been intellectually valid but would have abdicated the responsibility to provide the best support possible to the policy process. As Kent reminisced about Cuba in 1962, when dealing with something that cannot be known for sure “there is a strong temptation to make no estimate at all. In the absence of directly guiding evidence, why not say the Soviets might do this, they might do that, or yet again they might do the other—and leave it at that?” Foreswearing any educated guess “has the attractions of judicious caution and an exposed neck, but it can scarcely be of use to the policy man and planner who must prepare for future contingencies.

Conscious of responsibility to contribute to decision, managers of the analytic process did not err on the side of caution. They wanted to avoid equivocation to keep the estimate from sounding useless. They believed that good analysis needed “to go beyond certain knowledge” even if this meant occasionally being wrong. As Mark Lowenthal, former Assistant Director of Central Intelligence, put it, “willingness to take such risks is undermined by fears of ‘failure.’ No one wants intelligence that is brash and wrong; pusillanimous intelligence is not any better.” Moreover, the CIA and other elements of the intelligence community were still smarting from the criticism they had received from the 1990s Rumsfeld Commission on missile development in rogue states, when intelligence predictions were judged to be too optimistic. As Tenet lamented, “the remedy for one so-called intelligence failure can help set the stage for another.”

If estimators were to act realistically and earn their pay, yet remain accurate, given what was known and knowable at the time, they should have posed three Key Judgments in the October 2002 NIE:

  • Iraq is probably hiding stocks of chemical and biological weapons and active programs to develop and produce chemical, biological, and nuclear weapons.
  • That conclusion is deduced primarily from obstruction of UNSCOM, failure to account for destruction of stocks known to exist in 1991, and some other circumstantial evidence.
  • There is very little direct evidence, and no highly reliable direct evidence, to back up the deduction.

This would have averted the irresponsibility of offering no judgment and could have fit within the one-page President’s Summary. It would also have been unwelcome to policymakers looking for the warrant for war, but would have been accurate, given what was known and knowable at the time. To most observers looking back after the invasion and the missing WMD, however, the difference between these and the actual Key Judgments would appear to be a matter of nuance rather than an acceptable analysis.

Was this failure a symptom of the system’s core weakness? The context in which the October 2002 NIE is seen determines how awful it looks. If measured by its relation to the justification for invading Iraq, it looks epochally awful. If measured as an entry in the set of assessments related to counterproliferation in general, however, it looks different. This is what the British post-mortem on London’s intelligence failure before the war did. The Butler Report investigated all proliferation-related intelligence projects, including those related to Libya, Iran, North Korea, and the A.Q. Khan network in Pakistan, as well as Iraq. These other intelligence projects were more or less successful, which made the Iraq case “one failure against four successes. Hence, it was viewed as a failure due to Iraq-specific factors that somehow tripped up an otherwise effective system,” not as evidence of thorough breakdown. The Ross-Silberman Commission also gave the U.S. system some credit for success on other proliferation cases. The Senate Intelligence Committee Report came out first in the United States, however, was unremitting in criticism, and set the tone for public understanding of the Iraq failure.

The fact of being wrong is not in itself evidence of mistakes that could have been avoided or that show dereliction. The Senate Intelligence Committee post-mortem tended to make that mistaken inference, and most lay critics did as well. Moreover, it is far from clear that the huge number of innovations in organization and procedure undertaken in the past several years, especially with the reform legislation of 2004, will cure more problems than they cause, or will endure under the perpetual time pressures and other constraints facing intelligence professionals from day to day. Perversely, then, measured against realistic standards, and awful as the failure of the Iraq WMD intelligence was, the system did not do as badly as common sense implies.” But so what? It is good news to find “that the system was not horribly broken, but bad news in that there are few fixes that will produce more than marginal (but still significant) improvements.” Being wrong for the right reasons means little to citizens who must live with the result, but it does provide a caution against drawing too many lessons from a single failure.