Barbara J Grosz. AI Magazine. Volume 33, Issue 4. Winter 2012.
In 1950, when Turing proposed to replace the question “Can machines think?” with the question “Are there imaginable digital computers which would do well in the imitation game?” computer science was not yet a field of study, Shannon’s theory of information had just begun to change the way people thought about communication, and psychology was only starting to look beyond behaviorism. It is stunning that so many predictions in Turing’s 1950 Mind paper were right. In the decades since that paper appeared, with its inspiring challenges, research in computer science, neuroscience, and the behavioral sciences has radically changed thinking about mental processes and communication, and the ways in which people use computers has evolved even more dramatically. Turing, were he writing now, might still replace “Can machines think?” with an operational challenge, but it is likely he would propose a very different test. This paper considers what that might be in light of Turing’s paper and advances in the decades since it was written.
In his seminal paper, “Computing Machinery and Intelligence,” which appeared in the journal Mind in October 1950, Alan Turing proposes to replace the philosophical question “Can machines think?” with an operational, behavioral one. The question of whether machines could think was on many people’s minds even in the infancy of computing machines. To avoid the need to define intelligence or thinking, terms that he uses interchangeably, Turing asks whether there are “imaginable digital computers which would do well” in a game he defines and calls “the imitation game,” and which we know as the Turingtest. The Turing test poses the challenge of constructing a computer system able to carry on a dialogue with a person, potentially ranging over any subject matter and many subject matters, well enough to be indistinguishable from a person.
Turing conjectures in the positive about the possibility of a computing machine succeeding at this game. In the Mind paper (Turing 1950, 442), he says, “I believe that in about fifty years’ time it will be possible to programme computers, with a storage capacity of about 109, to make them play the imitation game so well that an average interrogator will not have more than 70 per cent chance of making the right identification after five minutes of questioning.” He thus suggests that even a simpler test (only 5 minutes of interaction) and partial success will take a long time. In later remarks (Newman et al. 1952),Turing says it may be 100 years or more until a computer will succeed at the full imitation game.
The Turing test challenge has not yet been met (Watson, Siri, and systems succeeding in “restricted Turing tests” notwithstanding), and the relevance of the Turing test to establishing the intelligence of computers has been widely, and interestingly, debated for decades. I shall not engage in that debate in this article, but instead, will look at Turing’s question in the context in which it was posed; consider advances in psychology, computer science, and the capabilities of digital computing devices over the last six decades; and then, rather than discard the test for its faults, consider the power and influence it exerted as we look to the future in this 100th year after Turing’s birth, asking “What question would Turing pose, were he with us today?” First, though, I want to pause to remark on this extraordinary paper.
The Mind Paper: A Paper for a Century
The 1950 Mind paper is astonishing in its breadth. It is noteworthy not only for posing a question that has inspired decades of research in AI but also in defending the very idea that computing machines could do more than simply compute with numbers and in suggesting avenues to pursue in building a thinking machine. Turing chooses language, widely considered the most uniquely human of behaviors, as the basis of his test, and “imitation” of people’s competent use of language as the criterial marker of thinking. By posing an operational question, Turing helped set the stage for the field of artificial intelligence to be an experimental field, with empirical studies a crucial component. The Turing test challenge has inspired many and, indirectly if not directly, generated untold numbers of important research questions and decades of fascinating research, leading to computer systems that are “smart” in ways he himself might not have imagined. In many ways, Turing’s paper has guided that research, and it is stunning that so many of Turing’s suggestions in the 1950 Mind paper have proved valuable and so many of his observations remain relevant.
Turing foresaw the importance of machine learning to the construction of intelligent machines, and his insights about the role of logical inference and the challenges of automating proof processes were also prescient. Machine learning has become a central element of many areas of AI including natural language processing and multiagent systems. Turing’s comments about rewards and punishments and the role of analogical thinking (Newman et al. 1952) prefigure reinforcement learning and learning-by-analogy approaches. The control of inference mechanisms, whether logical or probabilistic, has also proved crucial to AI systems.
In the paper, Turing also lays out and responds to various arguments against the possibility of machines thinking with arguments that resonate even today, when we have the advantage of seeing computers doing many of the things people thought would be impossible. Of particular interest even now are his comments (Turing 1950, 447-450) about “Arguments from Various Disabilities” and “Lady Lovelace’s Objection.” He refutes in various ways claims that machines would not make mistakes (were that only true!), learn from experience, or be creative. To Lady Lovelace’s claim that, as he quotes, “‘The Analytical Engine has no pretensions to originate anything. It can do whatever we know how to order it to perform’ (her italics),” Turing agrees only that “The evidence available to Lady Lovelace did not encourage her to believe that computers could.” He then goes on to say (Turing 1950, 450), “Machines take me by surprise with great frequency.” In a BBC Third Programme from 1951 (Turing 1951),Turing amplifies his belief that machines could do something original, saying, “If we give a machine a programme which results in its doing something interesting which we had not anticipated I should be inclined to say that the machine had originated something, rather than to claim that its behaviour was implicit in the programme, and therefore that the originality lies entirely with us.”
There is more, much more, in the paper. It is a truly a paper of the century, not only inspiring AI and helping initiate research in this area, but also influencing research in theoretical computer science and computer systems and foreshadowing important ideas in all these fields. For instance, Turing recognizes that randomness may be important to computing and to machine intelligence. Randomization has proved to play significant roles in theoretical computer science and in a wide range of security applications. It also has proved useful for machine learning algorithms (for example, for creating ensembles).
Turing also had wise words about research paths. He says (Turing 1950, 460), “We may hope that machines will eventually compete with men in all purely intellectual fields. But which are the best ones to start with? Even this is a difficult decision. Many people think that a very abstract activity, like the playing of chess, would be best. It can also be maintained that it is best to provide the machine with the best sense organs that money can buy, and then teach it to understand and speak English … Again I do not know what the right answer is, but I think both approaches should be tried.”
Finally, a bit later, Turing answers those who might be skeptical of the very pursuit of building a thinking machine and is again prescient. In the 1951 BBC broadcast (Turing 1951), Turing said, “The whole thinking process is still rather mysterious to us, but I believe that the attempt to make a thinking machine will help us greatly in finding out how we think ourselves.”
Acceptance of the Notion of “Thinking Machines”
Although Turing’s expectations for computers to succeed at the imitation game have not yet been realized, his accompanying prediction (Turing 1950, 442) that “at the end of the century the use of words and general educated opinion will have altered so much that one will be able to speak of machines thinking without expecting to be contradicted” has proved true. Computers now seem to behave intelligently in so many ways that speaking of them as “thinking” or being “smart” has become commonplace.
Enormous progress in hardware capabilities, both processor speed and storage capacity, has been essential to progress in AI. Techniques for reasoning probabilistically have enabled systems to deal successfully with uncertainty and provided new ways of modeling causality. The ability to handle massive quantities of data, along with machine learning and statistically based algorithms, have led to impressive natural language processing and speech system performance. There have been significant advances also in vision and robotics, and, thanks in part to demands from the computer game industry, it is possible to obtain at very low cost the infrastructure needed for research in these fields.
AI research has enabled such highly visible, intelligent-seeming systems as Deep Blue, Watson, and Siri as well as search engines, systems that automatically personalize music selections, and driverless cars. The performance of many of these systems, which is extraordinary, rests on decades of research in natural language processing, machine learning, and search. AI techniques also underlie a number of technologies that have changed credit-card fraud detection and many other day-to-day activities.
As consequential as many of these systems are, however, computer systems are also limited in the scope of their intelligent behavior. It’s in the errors that systems make that it’s most evident that they have not cleared Turing’s hurdle; they are not “thinking” or “intelligent” in the same sense in which people are. Typically, their capabilities are restricted along such dimensions as types of questions they can handle, domains, and their ability to handle unexpected input. Their behavior when they “don’t get it” is frequently signaled by a user asking “How could it be so stupid?” or “Why did it do that?” Recent examples that have been widely noted include certain answers Siri provides when it does not know it does not know or has not heard a question correctly and Watson’s response of “Toronto” (rather than Chicago) to the Jeopardy U.S. Cities probe, “Its largest airport was named for a World War II hero; its second largest, for a World War II battle.”
New Perspectives in a New Century
Turing’s paper was a paper for the century, but his imitation game test was very much a challenge of the times in which the paper was written. In the early 1950s, computer science was not yet a field of study, computers were used by one person at a time to solve a particular (usually mathematical) problem, and they were limited in storage and computing power to an extent it is hard now to grasp. Shannon’s theory of information had just begun to change the way people thought about communication (Shannon 1948). Behaviorism was the dominant theme in psychology, and it was years or decades later that new models from information theory and computer science led to conceptions of cognitive models and cognitive science being established. Turing conjectured that increased storage, and perhaps increased speed, along with some years of programming and testing would suffice for a machine to succeed at the imitation game. With hindsight we can easily see there was also much intellectual work to be done in many fields.
In the decades since the Mind paper appeared, advances in computer science, neuroscience, and the behavioral sciences have radically changed our understanding of mental processes and their development, of how people learn, and of communication. The cross-fertilization of research in these fields has been important to advances in all of them. Extraordinary progress in psychology and neuroscience has led not only to our knowing more, but also to the development of new tools and techniques for investigating mental processes and cognitive development. As our understanding of thinking has evolved and computer science has matured, we have arrived at goals more varied and deeper than asking simply whether machines can think. We have also come to appreciate the importance of evaluation metrics that can determine partial success, which the Turing Test does not permit.”
Although computational metaphors have proved helpful in modeling human cognitive processing, there is widespread appreciation that computer systems do not need to process information in the same way as people do to exhibit intelligent behavior. People and computer systems have different strengths and weaknesses, and the mechanisms they deploy in processing information are different. Computers are far better at searching and handling large amounts of data though (at least to date) much less good at interpreting photographs or handwriting, as witnessed by people’s use of search engines and programs for analyzing “big data” (on the one hand) and the engagement of “human computation” to help with certain types of tasks that are beyond current computational capabilities (on the other hand).
We have also come to appreciate that language use, which Turing chose as the essential element of his imitation game, is inherently social and intimately connected to communicative purpose and human cooperation. Language is not a separate cognitive phenomenon, but is closely tied to purposeful action and interpersonal connections. It is intentional behavior, not simple stimulus-response behavior. Austin, Grice, and Searle argued persuasively in the 1960s that intentions were essential to deriving meaning (Austin 1962, Grice 1969, Searle 1969). Vygotsky (1978) argued similarly with respect to child development earlier (though in relative obscurity until the 1970s), and recent work in neuroscience and child development points to the centrality of cooperation in developing language competencies. Ordinary language use also presumes that participants have models of each other, models that influence what they say and how they say it.
The imitation game, considered most starkly as a question-and-answer series, misses these essential aspects of language behavior. The choice of Jeopardy as the domain for Watson was clever in avoiding the need to consider dialogue context, the purposeful use of language, and any model of other participants (though it did require a model of the Jeopardy game itself). Limited capabilities for some of these aspects of language seem to have been built into Siri. When they are sufficient, the results are impressive. When they do not suffice, though, the results are at best puzzling and at worst annoying. If you want to stump Siri, just ask a question that depends on some of these aspects of language use.
We also now know a great deal more about learning, both human and machine learning, than was known in Turing’s day. Turing’s image of the child machine was essentially that of an empty vessel into which facts would be poured. Infants though are not simply little adults whose brains need more facts, perhaps tempered by experience. Research in developmental psychology, education, and neuroscience has significantly changed understanding of the process of learning from this empty-vessel view to ones in which students are “active learners” and the importance to learning of students working with new ideas and interacting with others as they do so.
Of equal importance to all these advances, has been the qualitative change in the ways in which people use computer systems resulting from the increased power of these systems and advances in all areas of computer science. For the first several decades in which they existed, computing systems were used by individuals. Although time sharing enabled multiple people to share the same machine, with few exceptions (of which the work by Engelbart [for example, Engelbart 1962] is among the most notable), most computing efforts were individual ones. This is no longer the case. The proliferation and increased capabilities of computer networks in the last two decades has led to the predominant use of computing devices being in “mixed networks”10 of people and machines, often working together or sharing information to do tasks that involve multiple participants.
Turing’s argument for the universality of digital computers holds, but the immense change in the power of actual machines and ways they are used would impel him to ask a different question today. The difference in use might be the greatest impetus for changing the question, even more than all we have learned about human cognition. The input-output interactions of Turing’s day have been supplanted by myriad possible ways of communicating with machines and with each other, often using those same computing machines. Turing could now imagine abstracting away from physical irrelevancies, without being limited to a series of question-answer pairs.
What Question Might Turing Pose Now?
Were he alive and active today, Turing might still argue to replace the philosophical “Can machines think?” with an operational challenge, but I expect he would propose a very different test given the transformations in computer use as well as the significant advances in computer science and the cognitive and brain sciences in the last six decades. Several properties of the test might (and should) remain the same, including its inherently interactive nature and lack of restriction on possible topics of discussion. Other things would or should change. I would hope that he would propose a challenge that involved language in real use, rather than a game in which the sole purpose of language is to succeed in the game, and one that reflected the extent to which people’s thinking and acting intelligently is rooted in their participation in (small) group activities of various sorts, formal and informal, with relatives, friends, colleagues, and even strangers. A test of competence in the ordinary use of language in situ would be a better measure of the kind of intelligence with which Turing was concerned than a test of skill in a game in which the use of language has no independent purpose.
One possibility for this new question arises naturally from considering the challenges of constructing a computer system able to collaborate with human team members on a nontrivial task extended in time. In my 1994 AAAI presidential address (Grosz 1996), I argued that “[d]esigning systems to collaborate with [people] will make a difference to AI; it will make a difference to computer science, and, by enabling qualitatively different kinds of systems to be built, it will make a difference to the general population.” I ended the talk saying that “working on collaboration is fun. I hope more of you will join in the game.” In the intervening two decades, many AI researchers have taken on the challenges posed not only by collaboration but also by the full range of group activities, whether collaborative, cooperative, or competitive, in which computer systems might participate alongside people.
The literature is too vast for me to summarize here, but I will mention a few results that have emerged from my research group. Our research aims to enable computer systems to participate as full partners with people in multiagent activities, including transforming the ways in which computer systems interact with people. Rather than the screen-deep interaction of many human-computer interfaces, we aim to provide systems with the ability to collaborate as team members or coordinate, and to support people in collaborating and coordinating in their activities. It considers the appropriate division of labor to be an essential design choice and models of other agents crucial. These capabilities are essential for good “digital assistants” and important if systems are to work with people to solve problems neither could (as easily) solve alone. We have built a test bed that enables experimentation with different agent designs in mixed networks of people and systems as well as the analysis of the ways people behave in such networks (Gal et al. 2010).
Our research on collaboration has included expanding formal models of collaboration to handle uncertainty about partners’ plans and, as illustrated in figure 6, deciding when to communicate with them (Kamar, Gal, and Grosz 2009a); developing models and algorithms for efficient decision making under uncertainty and using them to analyze people’s perceptions of the value of interruptions (Sarne and Grosz 2007; Kamar, Gal, and Grosz 2009b); and devising methods for learning about reputation (Hendrix, Gal and Pfeffer 2009). We have applied some of these ideas to the design of a collaborative interface for educational software (Gal et al. 2012).
Our research on computer decision making informed by people’s behavior has examined the ways in which people’s decision making changes when they are in groups and when they are interacting with computer systems (Gal et al. 2010, Van Wissen et al. 2012), the use of emotion expressions to increase perceptions of trustworthiness (Antos et al. 2011), and the use of emotionlike operators to improve computer decision making (Antos and Pfeffer 2011).
With all this in mind, here’s my proposal for the question Turing might pose now:
Is it imaginable that a computer (agent) team member could behave, over the long term and in uncertain, dynamic environments, in such a way that people on the team will not notice it is not human.
Several properties of this test are worth noting. It shares with Turing’s original test that it does not ask a machine to appear like a human. It furthermore does not ask that the computer system act like a person or be mistaken for one. Instead it asks that the computer’s nonhumanness not hit one in the face, that it is not noticeable, and that the computer act intelligently enough that it does not baffle its teammates, leaving them wondering not about what it is thinking but whether it is. This test differs from Turing’s in allowing for incremental development and testing. Systems can start with simple group activities and advance to more complex ones. They can almost pass and adapt to do better. One can devise intermediate measures of accomplishment.
It is worth noting that advances beyond those in Siri, Watson, and Deep Blue (to take three of the most visible AI successes) are required to meet this challenge. Systems will need models of their teammates’ knowledge, abilities, preferences, and more. They will need to figure out how to share information appropriately (what with whom and when). They will need to manage plans and recover from errors. And much more.
Turing ends his paper by saying (Turing 1950, 430), “We can only see a short distance ahead, but we can see plenty there that needs to be done.” That remains true today. I invite readers to conjecture and design their own new “Turing questions” to set new visions.