Computer Science

Ellis Horowitz. New Dictionary of the History of Ideas. Editor: Maryanne Cline Horowitz. Volume 2, Charles Scribner’s Sons, 2005.

Computer science is often defined as “the systematic study of algorithmic processes, their theory, design, analysis, implementation and application.” An algorithm is a precise method usable by a computer for the solution of a problem. The term algorithm comes from the last name of a Persian author, Abu Ja’far Mohammed ibn Musa al Khowarizmi (c. 825 C.E.), who wrote an early textbook on mathematics. Some computer scientists study broad classes of algorithms, while others study algorithms for a specific task. Algorithms must be written down in some notation. Often the notation used is a programming language, as algorithms written in a programming language can be transformed and executed on a digital computer. Such algorithms are called computer software. Computer science is also concerned with large software systems, collections of thousands of algorithms, whose combination produces a significantly complex application. For these systems new issues become prominent: reliability, security, dependability, scalability, and modifiability of both the computer software and hardware. Another aspect of computer science is the impact it has had on other disciplines. Computer science “thinking,” namely the modeling of processes by algorithms, has had a major impact on researchers in other fields.

Early History

The field called computer science was born in the 1940s, though its roots extend back to the nineteenth century and even earlier. One of the early founders of the field was Alan Turing (1912-1954), a citizen of Great Britain, who in 1937 published his famous paper entitled “On Computable Numbers with an Application to the Entscheidungsproblem.” In this paper he introduced the concept of an abstract computing device, later dubbed a Turing machine. It was precisely the simplicity of his model that permitted scientists to ask, and answer, fundamental questions about the nature of computation. Any computer can be simulated by a Turing machine, and the converse is also true. Moreover, the complexity of Turing machine computations yields insights into the efficiency of computations on real computers. Two other famous mathematical logicians who made early contributions to computer science were Alonzo Church (1903-1995) and Kurt Gödel (1906-1978). Church developed a system called the lambda calculus, which makes possible logically precise expressions of mathematical propositions. The lambda calculus proved to be a model for functional programming, and the popular LISP programming language used the lambda calculus as its theoretical base. Of equal import was the so-called Church thesis, which states that every effectively calculable number-theoretic function is lambda-definable. The importance of this result is that it ties numerical computation to string manipulation, an initially surprising result. Kurt Gödel is best known for his proof of “Gödel’s Incompleteness Theorems.” As early as 1931 he proved fundamental results about axiomatic systems, showing that in any axiomatic mathematical system there are propositions that cannot be proved or disproved within the axioms of the system. In particular the consistency of the axioms cannot be proved.

Computer science was not founded solely by mathematicians. An equally important group was electrical engineers, who focused on actually building a computing machine. World War II identified and spurred a need for computing devices, machines that could help carry on the mechanics of war. Enlisted into this cause were some of the greatest scientists of the day. One of these was Howard Aiken (1900-1973), who in 1944 built the Automatic Sequence Control Calculator (Mark I) at Harvard University. Another groundbreaking effort was the development of the ENIAC computer by John Mauchly and John Presper Eckert at the Moore School of Electrical Engineering of the University of Pennsylvania. Mauchly and Eckert were in turn influenced by John Vincent Atanasoff (1903-1995), who is now widely recognized as the inventor of the world’s first electronic computer, the so-called Atanasoff-Berry computer (ABC machine). The ABC machine employed all of the basic units of a modern digital computer, including binary arithmetic, a separate arithmetic unit, and vacuum tubes for emulating logical switching circuits such as adders. The mathematician-turned-computer-scientist John von Neumann (1903-1957) worked closely with Mauchly and Eckert, and among many results he is usually credited with the idea of the stored program computer, the idea that a computer would contain within it both a program for processing the data as well as the data itself.

Computer Science Chronology

During the early period (1950s-1960s), a great deal of computer science work focused on understanding and refining essential elements of a computer system. Operating-systems software was developed to facilitate the control of the functional units of a computer: the processor and the input and output devices. Programming languages were devised so programmers could more easily express the computations they wished to accomplish. FORTRAN (Formula Translation), developed by John Backus and a team at IBM around 1954, was the first popular, high-level programming language. Its focus was on efficient numerical calculation. LISP (LISt Processor), developed by John McCarthy at MIT around 1956, focused on symbolic programming. Both languages had major impacts and, though less popular, were still in use in the early twenty-first century.

The Study of Algorithms

As the definition of computer science is the systematic study of algorithms, it is not surprising that the decade of the 1970s was a period when the study of algorithms was dominant. One aspect of this research was the development of a theory of algorithms. Building on the theories of Church and Turing, computer scientists asked questions such as “Is there an algorithm for any Turing machine such that it decides whether or not the machine eventually stops if it is started in some initial state?” This is termed the Halting Problem. The Halting Problem has been shown to be unsolvable. Another aspect of the theory of algorithms has to do with problem reducibility. For example, it has been shown that if an algorithm did exist for the Halting Problem, then it would be possible to solve Hilbert’s “tenth problem”—namely, given a Diophantine equation, determine a procedure that decides in a finite number of steps if the equation has an integer solution. Computer scientists have shown that Hilbert’s problem is reducible to the Halting Problem and is therefore unsolvable.

A second aspect of algorithm studies concerns the development of new algorithmic solutions for specific problems. Topics such as sorting, searching, and graph theory were closely studied from an algorithmic point of view. Many new and efficient algorithms for these topics have been produced—for example, Hoare’s Quicksort algorithm for sorting, hashing as a technique for searching, and Tarjan’s algorithm for determining graph planarity, to name just a few. As the search for new algorithms proceeded, new techniques for analyzing the performance of these algorithms were developed. Methodologically, worst-case, best-case, and average-case analysis have become standard questions to address when presenting an algorithm. There are standard mathematical notations for presenting these results. Algorithm design techniques were identified, for example, divide-and-conquer, depth-first search, greedy method, and dynamic programming.

No discussion of computer science would be complete without a discussion of its most famous problem, “Does P NP?” P is the set of problems that can be solved in deterministic polynomial time. That means that for a problem with inputs of size N, there must be some way to solve the problem in F(N) steps for some polynomial F. F can be any polynomial, even N to a very high power. For example, sorting a set of N numbers can be done in polynomial time. This problem is in the set P. NP is the set of problems one can solve in there must be some way to solve the problem in F(N) steps for some polynomial F just as before. In NP, however, the program is allowed to make lucky guesses, though it must prove the solution is correct. Many problems are in NP—for example, the traveling salesman, finding a Hamiltonian cycle, satisfiability of propositional expressions, finding a maximum clique, integer knapsack problem, and the optimal scheduling problem.

All problems in P are also in NP. You do not need to use nondeterministic guesses in an NP program if you do not want to. But does P NP? This problem was first posed by Steven Cook (1971). No one has ever proven that they are equal and no one has ever proven they are not. But despite this failure, the “P NP?” problem has contributed a great deal to our understanding of algorithms. This has come about because computer scientists have been able to identify a large class of problems, all of which are reducible to each other, that is, solving one of these problems will immediately lead to a solution for the others. This class is called the NP-complete problems. The fact that NP-complete problems are (intuitively) the most difficult in NP follows from the fact that we may prove P equals NP if and only if some NP-complete problem has a polynomial-time algorithm. In his original formulation of the “P NP?” problem, Cook showed that the satisfiability problem (find an assignment of values to Boolean variables in a logical statement to make it true) was a member of the NP-complete problems (this is called Cook’s Theorem). However, despite many efforts, no one has ever shown that an NP-complete problem is also in P. Because no one has found such an example, many researchers believe that P is not equal to NP. And yet computer science is relatively new, and lots of other difficult problems have remained unsolved for centuries before someone came up with a solution, so perhaps this one just needs more time.

Artificial Intelligence

The 1980s saw a great deal of work in the field of artificial intelligence (AI). AI is the subfield of computer science that is concerned with the computational understanding of intelligent behavior and with the creation of systems that produce such behaviors. One approach has been to develop computer programs that mimic the way humans behave. A second approach is to produce a computational model of some intelligent behavior, regardless of whether the model imitates human processes or not. One of the earliest examples of the latter approach was James Slagle’s SAINT program for performing symbolic integration at the level of a college freshman. AI researchers early on emphasized heuristics as a problem-solving approach. Heuristics differ from algorithms in that they may not converge (or produce) the correct or exact answer, but experience shows that they often produce acceptable answers. For example, computers that play chess must evaluate the worth of their position at every step. Since it is computationally infeasible to work out the worth of all of the possible moves, researchers have developed heuristics that return a numerical assessment of the worth of a future move.

Returning to Alan Turing, in an article he published in 1950 he described a game in which a human interrogator attempts to determine, solely on the basis of a written interrogation, whether the identity of the “person” answering his questions is in fact a person or a computer. This has come to be known as the Turing test. This challenge inspired a great deal of AI research. Of special mention is the ELIZA program developed by Joseph Weizenbaum, which appears to emulate a nondirective therapist. If the person being interrogated says “I am very happy,” ELIZA might respond with “Why are you very happy?” and so it goes. Other researchers attempted to develop computer programs that would exhibit behaviors that were believed to be signs of human intelligence. Playing chess or proving mathematical theorems were two such areas of intense study. There are many subareas of artificial intelligence, including natural language understanding, general problem solving, knowledge representation and reasoning, learning, machine vision, robotics, and autonomous agents.

Personal Computers and the Internet

An important phenomenon of the 1980s was the success of industry in producing a low-cost, miniaturized computer processor. This brought about the personal computer. Initially these machines might include 640,000 bytes of memory, no disk (only a floppy drive), and a processor whose speed might be 1 KHz (kilo-hertz) or less. The cost of such a machine was approximately $5,000. As technology improved, prices steadily dropped while capabilities were enhanced, and computers moved from the exclusive domain of the government and large corporations into the home and small business arena. A personal computer in 2004 might have 1 billion bytes of memory, a disk with a capacity of 100 gigabytes, and a processor whose speed is 10 MHz (megahertz) and might cost less than $2,000. Word processing and accounting (spreadsheets) became dominant applications. But this had little effect on computer science per se.

In the 1990s research that had started back in 1969 with the U.S. Department of Defense, and that led to the digital network of computers called the ARPAnet, became accessible to everyone as the Internet. Research on packet switching and network protocols led to the development of TCP/IP (transmission control protocol/Internet protocol), the standard that now enables any pair of connected computers to communicate. As the number of Internet users grew, computer scientists began to study how best to exchange information between them. This culminated in the development of the World Wide Web by Tim Berners-Lee (c. 1990).

In the early twenty-first century new issues challenged computer researchers. People studied how digital sensors by the thousands can be coordinated to do things like predict the weather. Others developed methods for connecting thousands of idle personal computers together to work on a single problem (GRID computing). And simulations of physical processes down to the atomic level could be achieved, as digital computer speeds reached Teraflop level (a thousand billion instructions per second).

Basic Methodologies of the Field

The digital computer is the center of computer science. Abstract models are developed in the hope of capturing essential elements, but the models need sufficient accuracy so conclusions reflect what will actually occur on a real digital computer. Algorithmic thinking requires one to express solutions to problems as a sequence of steps, each one sufficiently precise that it could be translated into the elemental steps of a digital computer, and then to analyze the efficiency of these steps.

Another fundamental mode of thinking for a computer scientist is representation—namely, the way in which data is stored so that algorithms making use of the data will compute efficiently. For example, a phone book is organized alphabetically so we can easily locate a person’s phone number if we know his or her name. On the other hand, if we know a phone number and want to know the name of the person who has that number, the phone book is useless. Organizing the data so questions can be answered efficiently is the end goal of data representation. Representation does not only refer to ways to organize data but also ways to encode data. For example, how does one represent a mathematical expression that needs to be differentiated or integrated, or how should one encode speech, sound, or a movie so that it is compact yet is able to be faithfully rendered? Compression algorithms have succeeded in reducing the size of popular songs to approximately 3 megabytes (3 million characters), but a full-length feature movie, using the best encoding scheme of the day, requires approximately 1,000 megabytes. The former can be transferred across the Internet in a matter of minutes, while the latter requires several hours or more. Computer scientists are researching both sides of the problem, studying how to increase the bandwidth of the network while also improving the degree to which compression algorithms work.

Computer scientists have differing approaches to problems. Theoreticians aim to bring order to rapidly emerging subfields. They attempt to develop models or analytic methods to help understand what is going on. In some computer science areas formal models exist, such as automata theory, switching theory, graph theory, and formal languages. However, for some fields, such as operating systems, programming languages, and compilers, theory has had a limited impact. Experimenters build systems and then use them to test out a variety of questions. Performance analysis and comparisons of different architectures are often the results of such experimentation.

Relationships to Other Disciplines

Computer science originated within mathematics, mainly through mathematical logic, and through electrical engineering with the use of Boolean algebra and switching theory to describe electronic circuitry. Conversely, computer science has strongly influenced mathematics. In some cases computers have been used to help prove theorems. One example is the question of whether four colors are sufficient for coloring any planar map, called the Four Color problem. This problem remained unsolved for more than one hundred years until the Four Color Theorem was proven by Kenneth Appel and Wolfgang Haken in 1976. As part of the Appel-Haken proof that four colors are sufficient, they use a computer to investigate a large but finite number of potential counterexamples.

Computer science has an equally strong connection with engineering, and in many ways the connection is much stronger than with mathematics. Computers are now indispensable when it comes to designing and building any complex structure, from a skyscraper or submarine to a computer. CAD/CAM systems (computer-aided design/computer-aided manufacturing) rely on a combination of computer graphics, specialized algorithms, and a complex of supporting software to provide the engineer with a set of tools by which one can master the complexity involved.

In the late 1990s and early 2000s a new bond grew between the physical sciences and computer science. The fields of physics, chemistry, biology, geology, and astronomy posed grand challenge experiments, problems that require massive high-speed computations. Human-genome sequencing is one such problem. Biologists view DNA as an encoding of information needed to generate a unique organism. The international effort to sequence the 3 billion DNA letters in the human genome, accomplished on 14 April 2003, was considered by many to be one of the most ambitious scientific undertakings of all time. Computer science played a pivotal role. All of the sequence data generated by the Human Genome Project has been deposited into public databases and made freely available to scientists around the world. Assembling and interpreting this data has required new levels of coordination and collaboration of computer scientists and biologists to formulate the necessary computing algorithms, data-management approaches, and visualization systems. In short, high-performance computing has fundamentally changed the way biologists do science; parallel computing systems have enabled high-throughput genome analysis; and modern search engines are allowing access to unprecedented amounts of biological data.

Another grand challenge is the Human Brain Project. This is a broad-based effort involving neuroscientists and information scientists (computer scientists, engineers, physicists, and mathematicians). The goal is to produce new digital capabilities providing a World Wide Web (WWW)-based information management system in the form of interoperable databases and associated data management tools. Tools include graphical interfaces, information retrieval and data analysis, visualization and manipulation, and biological modeling and simulation. It is expected that the neuroscience databases will be interoperable with other databases, such as genomic and protein databases. From these two examples and many more one sees that researchers from many fields are now regarding computation as a third paradigm of scientific investigation, alongside theory and experimentation.