The Displacement of Human Judgment in Science: The Problems of Biomedical Research in an Age of Big Data

Paul Scherz. Social Research. Volume 86, Issue 4. Winter 2019.

In The Human Condition, Hannah Arendt lamented the state of the mathematical natural sciences. While recognizing their power to transform the world, she thought that they leave much to be desired in terms of providing an explanation for the world once they lose contact with human senses and intuitions. Surveying fields like quantum mechanics in which we still do not have a good picture of the world described by the equations, Arendt suggested that science was being reduced to mere technical intervention into nature rather than an understanding of it. “But the mathematization of physics … had in its last stages the unexpected and yet plausible consequence that every question man puts to nature is answered in terms of mathematical patterns to which no model can ever be adequate” (Arendt 1998, 287). Understanding is lost, leaving “us a universe of whose qualities we know no more than the way they affect our measuring instruments” (261). Of course, similar critiques were levelled throughout the mid-twentieth century by many other scholars—Edmund Husserl, Martin Heidegger, Hans Jonas, and Georges Canguilhem. As Husserl put it, “This arithmetization of geometry leads almost automatically, in a certain way, to the emptying of its meaning” (1970, 44). Physical science had lost hold of the original intuitions underlying the quantification of nature, driving it from the grasp of human understanding.

To many scholars who shared this critique, one natural science seemed like an exception: biology. Heidegger argued that “all sciences concerned with life must necessarily be inexact just in order to remain rigorous. A living thing can indeed also be grasped as a spatiotemporal magnitude of motion, but then it is no longer apprehended as living” (1977, 120). Because of the rich complexity of organisms, it was not thought that they could be understood purely mathematically. Instead, one needed to understand their purposes, aims, and goals; that is, their teleology. Even those like Ernst Mayr, attempting to mathematize biology as part of the Modern Synthesis of evolution, understood this need, although he preferred the cybernetically derived term “teleonomy” to the older “teleology” (Mayr 1989). For example, a complete, detailed description of the chemical and signaling reactions that lead to birds migrating would still be a deficient explanation if we did not also include an account of why birds migrate. Because it deals with living things, biology requires richer forms of judgment and explanation than other natural sciences.

This exceptional feature of biological explanation had been recognized already by Immanuel Kant, who argued that science needed something beyond the mathematics of Newtonian physics. While he would have liked to submit all of nature to that mathematical form of description, it was impossible because living organisms demanded this richer teleological description (Kant 2000, 386-88). Most philosophers in prior generations shared Kant’s view. A science of purely mathematical relations between entities could not capture the complexity and goal-directedness of biological life.

Then, two decades ago, the Human Genome Project brought a sea change. Biology began to be inundated by massive amounts of genetic and other data, much of which was produced and interpreted by machines. Analogous to developments in other fields like medicine and education, human judgment began to look inadequate to the task of analyzing this data and using it to develop the therapies that had been promised in exchange for the massive public and private investment in the machine infrastructure. Scientists began suggesting that these vast accumulations of data, “mined” and interpreted by machines, could reveal knowledge. For instance, an editorial in Nature entitled “Can Biological Phenomena Be Understood by Humans?” quoted a biologist arguing that scientists “have to free [themselves] from the hypothesis-driven approach” and rely more on computer models (Anonymous 2000). After 2000, practicing scientists and philosophers of biology joined a broad debate over the relationship between the scientist, computer-analyzed data, and human reason in biological knowledge production (Allen 2001a, b; Gillies 2001; Smalheiser 2002; Kell and Oliver 2004).

The argument for the seeming promise of artificial intelligence in science was put in an extreme way in a 2008 Wired article by Chris Anderson, who prophesied an age of hypothesis-free research undertaken through machine learning. He foresaw “a world where massive amounts of data and applied mathematics replace every other tool that might be brought to bear. Out with every theory.” The research establishment could forego the creative role of human judgment in developing theoretical models from which one could derive hypotheses to be tested. Instead of such mental models of the world, algorithms would identify correlations and possible drug targets by combing through data. “Petabytes allow us to say: ‘Correlation is enough.’ We can stop looking for models” (Anderson 2008). Human teleological—or even theoretical—judgment would no longer be brought to bear. Anderson’s predictions have been criticized, and many commentators challenge his understanding of what is going on in big data science. But even if few researchers explicitly attempt hypothesis-free work in practice, it is worthwhile to engage with this ambition. It reflects deeper cultural currents as well as the increasingly practical turn to big data, in silico research, and artificial intelligence-aided drug design in biology. And it illuminates some of the underlying contemporary conceptions of human judgment in relation to data. There is reason to believe that the philosophers’ claims that biology must engage in a different form of teleological judgment than more mathematical sciences may be in the process of being undermined. Our biology is becoming data-driven.

To explore this change in more detail, I first consider the history of how biology has arrived at its current crossroads. Then I argue that the contemporary data-driven paradigm has a faulty conception of how scientific judgment occurs to the detriment of the practice of science. This misunderstanding leads to problems for contemporary biology. In practice, science needs a richer understanding of human judgment, one that makes space for teleology, creativity, and tacit, embodied knowledge. Lacking a respect for how judgment functions, biology will not deliver what we ask of it—neither in the form of new therapies nor, and more importantly, in a deeper understanding of the world.

The Post-genomic Condition

Sociologist Jenny Reardon is correct in saying that scientifically, and even in terms of broader social issues, we exist in a postgenomic condition. The story of how biology took its current shape is deeply tied to the Human Genome Project and its underlying motivations (Reardon 2017; Fortun 2008; Stevens 2013; Leonelli 2016). In the 1980s, geneticists began to decipher the specific DNA mutations that cause a limited number of hereditary diseases, like cystic fibrosis. At that time, this process involved a lengthy attempt to locate the suspected gene on a specific piece of the chromosome by using restriction enzymes to analyze the DNA of families affected by the disease. Once a particular part of the chromosome was isolated, the lab then had to engage in a tedious process of sequencing thousands of base pairs using nonautomated gel technology. This task was difficult and time-consuming. Researchers wanted to accelerate the process by automating aspects of it, especially the sequencing stage. Such technological development, however, would not come cheaply and was beyond the budget of any one lab; even for-profit companies balked at undertaking the task unless they could be assured of a sufficiently sizable market.

For these reasons, researchers decided that they needed to convince the government to commit significant resources to developing a new sequencing infrastructure. Given the estimated costs, big promises had to be made. After researchers sold the payoff to politicians on a grand scale, like the moon landing, the Human Genome Project was born. One scientist said,

From the scientific point of view objections could be raised against a sequencing project for the human genome at this time. On the other hand, if extra funds can be obtained to rally the public around a well-defined project … there may be advantages. It is hard to motivate governmental agencies, or the public at large, to support fundamental work in genetics (Fortun 2008, 35)

As the project was promoted to the public, the rhetoric became even more elevated and the promises more expansive. Scientists, it was said, would learn to read the Book of Life and would revolutionize health care. Eric Lander, one of the chief organizers and influential policy-makers of contemporary biology, was concerned about this strategy: “I guess I found it … very troubling that people were cramming this down people’s throats as the holy grail. Yes, it’s very important, but it’s infrastructure” (36). The actual significance of the project was in creating the sequencing infrastructure. On that, it delivered.

The Human Genome Project has provided a wealth of new knowledge, especially by using the genomes of other model organisms, like flies and nematodes, in trial runs. The knowledge, however, shows us that most diseases are not caused by single mutations. They involve complex interactions of multiple gene products with environmental influences, making genomic information very difficult to interpret. The Project has not revolutionized health care as promised, but it has revolutionized biomedical research. As philosophers of technology never tire of reminding us, major new technologies are never neutral instruments. They tend to reshape preexisting practices and understandings. This is doubly so when a new technology reinforces a preexisting cultural proclivity, like the drive to mathematize scientific understandings of the world.

The infrastructure created by the Human Genome Project reshaped the practice of science in at least four ways. First, it centralized power in the few centers that could afford the machines. As predicted, the new sequencing machines were tremendously expensive, so not many labs could afford one and thus have the opportunity to develop the expertise to use the equipment well. Policymakers decided to centralize the sequencing aspect of the project at institutions like Massachusetts Institute of Technology and Washington University, which bought many of the new sequencing machines and developed the computational tools to analyze the data. These places generated the data for the rest of the field. Their managers, newly influential, then worked to convince funders and corporations of the ongoing need for ever faster and cheaper sequencing machines.

Second, once these labs had the expensive sequencing infrastructures in place, the machines had to be kept busy. The genome project was declared complete in 2001, so the managers of the key centers then had to find something for the machines to do. One could not leave tens of millions of dollars of capital investment standing idle. Therefore, the genomes of more people and species were sequenced. Biology “adapted itself to the computer,” reshaping itself to fit the capabilities of the machines (Stevens 2013, 41).

Third, it became clear that ordinary biologists could not deal with the huge amounts of sequencing data in the way they had previously dealt with genetic data. It was just too much, beyond human reckoning. Scientists turned toward computer analysis and sought better statistical methods merely to keep up with the billions of base pairs of sequencing data.

Finally, the requisite statistical analysis of the new data and the needs of the newly developed information infrastructure changed the kind of people doing biology. Early on in the sequencing project, the architects realized that they would need people with skills in quantification and algorithm development to analyze the emerging sequences. They began hiring computer scientists and physicists to staff their labs. These researchers had a fundamentally different background than other biologists because they were never trained in benchwork with model organisms or cell culture. They experienced living things from the perspective of quantitative analysis rather than from the actual messiness of “wet” work. As Hallam Stevens notes, computer scientists “were interested in elegant, discrete, neat, tractable problems, while [biologists] constantly had to deal with contingency, exceptions, messiness, and disorder” (Stevens 2013, 52). In time, these quantitative analysts gradually moved from postdoctoral fellowships at genome centers to having their own labs, further changing the questions asked by the field as a whole. In these ways, the mere existence of the sequencing infrastructure has progressively shifted biology away from hands-on work with organisms toward algorithmic analysis of data.

This shift in the structure of genetics has led to speculation about hypothesis-free research—AI working by itself to find patterns and generate new insights from the ever-growing flow of raw data, a flow that is too much for the limitations and biases of human intelligence to handle. The dream of unbiased hypothesis-free research, however, is built on significant misunderstandings, and these illustrate more fundamental issues with the structural changes taking place in biological research. Contrary to what was hoped at the outset, humans and their theoretical paradigms, which impose ontological structures on the world, are not absent from data-driven research. Nor could they be. A practice of purely data-driven, theory-free research is impossible. All data are as theory-laden as philosophers of science have long taught us, even those that seem purely quantitative and neutral (Duhem 1954; Quine 1980).

Further, data-driven research needs people to create databases and interpret the literature. For example, Sabina Leonelli, codirector of the Exeter Centre for the Study of the Life Sciences, shows the importance of curators for the new data-centric biology (2016). The post-genomic infrastructure is dependent on extensive databases that coordinate multiple sources of data—genomic data, proteomic data, gene expression data, and such. These databases require specialists to input and organize the data. In so doing, they impose categories upon the data, categories that are not given in nature but are matters of judgment. One of the major types of database entries is individual genes. For these to be useful for further research and analysis, the specialists have to make determinations about these genes: Where do they function—on the cell surface, the nucleus, the cytoplasm? What role do they play—signaling, transcription? With what other proteins and genes do they interact? Interestingly, these database descriptions are called ontologies, suggesting that they can capture the essential nature of the cataloged genes. In fact, the descriptions are contested decisions and theory dependent. While bench scientists argue over these issues, the major disputes are between “wet” biologists and the specialist curators. The curators, bench scientists argue, lack the expertise and contextual knowledge to make the best distinctions. Of itself, data science cannot resolve these ontological questions.

Human judgment cannot be dispensed with. Another illustration is in the increasingly popular practice of direct-to-consumer genetic testing (Reardon 2017, 120-44). A surprising feature of these commercial services, such as Gen by Gen, Counsyl, and 23andMe, is that if you send your DNA off to more than one of them, you will likely get a different set of results from each—differing sets of health risks to worry about (136-39). The reason is not so much a problem with sequencing inaccuracies as a difference in which genes each company chooses to focus on and how each company rates the risks of each gene. These are necessary judgments made by each company’s analysts, who must read through the often contradictory genetics literature and decide upon the important genes and their related risk factors.

The fact that many genetics studies are too small and statistically underpowered to give firm results only makes this task more difficult. Analysts must use their judgment to decide which studies to trust. Even with respect to the limited recommendations of genes for medical testing maintained by the American College of Medical Genetics, there is great controversy over which genes to test and how to interpret results. The direct-to-consumer companies claim to examine a far greater number of genes, so the complications and uncertainties are vastly increased. There are requisite human judgments at each step of the genotyping process, a critical role obscured when results are presented as the precise calculations of algorithms.

The machines themselves introduce judgments. Because they are expensive to develop, there are only a few suppliers of the machines that gather the data, like DNA chips, DNA sequencers, or mass spectrometers (Leonelli 2016, 168). While many different labs are collecting data points, they are frequently collecting them only on a few kinds of machines made by an even smaller number of suppliers. Frequently, these machines come with software packages that already provide a first layer of analysis. Information is prepackaged in a way that reflects prior judgments of relevance and appropriate statistical modeling by people uninvolved in the research. Since each machine and its associated algorithms may introduce its own bias into the data, this creates problems for subsequent analysis and comparison.

Beyond these interpretive concerns, there are substantive theoretical issues with generating valid findings from big data correlations. Statisticians have long been warning scientists that one can always find some sort of statistical correlation given a large enough sample size. Big samples have led to the common practice known as “p-hacking,” which occurs when researchers experiment with various statistical analyses and/or manipulate data eligibility specifications and then report only those that produce significant results (Head et al. 2015). Geneticists have grown more cautious since the 1990s, when major papers announced findings of the single gene for all sorts of behaviors, from same-sex attraction to violence, that were later disproven. But spurious correlation and the inflation of true effect sizes are an ever-present danger with large data sets. It is especially concerning when significant correlations are reported without any model for how the variables might be related. Further, given that these correlations only emerge when samples are taken from large numbers of people, it is also unclear how they can be experimentally verified.

Finally, as I have discussed elsewhere, despite the big promises made on behalf of the Genome Project and similar technical initiatives, these endeavors have largely been therapeutic failures so far (Scherz 2019). If we look at the metrics for new therapies, they actually appear to have decreased since the 1980s, despite the dramatic increase in expenditures. There are, of course, some areas where genomics has yielded a rich harvest, such as with single gene disorders and the sequencing of tumors. Most of these results, however, are dependent on pre-genome project work and older forms of nonbig data research. More importantly, if we look to the goal of the practice of science—providing greater understanding of the world—then the data-driven paradigm has not provided any major theoretical advances. As Reardon and other commentators lament, researchers have amassed huge troves of data, but no meaning has emerged from them (Reardon 2017, 39). Researchers report and correlate, but they do not understand.

While many of the contemporary problems of science arise elsewhere—from disturbing shifts in the structure and incentives of scientific research that aim to make it more entrepreneurially productive—some of the blame also lies with this algorithm-focused research and its misconception of the role of qualitative judgment in the research process. To develop a better way forward, it is therefore important that we have a more accurate picture of scientific judgment.

Scientific Judgment

The model of scientific judgment explicitly celebrated by Wired’s Anderson, and used implicitly by many who are developing the big data infrastructure, works from the principle of induction. Induction is the form of scientific reasoning that begins with systematic observations of many instances of a phenomenon, detects regular patterns in those phenomena, often by experiments, and infers generalizations from the patterns. The generalizations in turn serve as the basis for scientific hypotheses and laws. To give a stock example: if one goes to all the ponds and lakes in Europe and looks at the swans, then one observes that all the swans are white. One is able to logically formulate the generalization that all swans are white. This was the foundation of the Baconian ideal of science: by sticking closely to the data, the researcher can avoid being ensnared by common errors of reasoning, what Bacon called the “idols of the mind.” The logical positivists, beginning with the Vienna Circle in the 1920s, also emphasized the inductive method for establishing and verifying empirical generalizations. The stress on induction was part of their program to differentiate true science from metaphysical speculation about the ontological state of the world, a speculation they regarded as meaningless. Science begins with observation and works upward, accumulating more and more data, making inductive inferences, and developing theoretical concepts. The program of the logical positivists was essentially an attempt to shape an algorithmic approach to research, as computers can sift through data as well as, indeed better than, human scientists.

Philosophers of science have criticized the inductive model ever since David Hume, both in terms of its description of how scientific practice actually operates and its logical adequacy. One of the great triumphs of analytic philosophy has been its fierce criticisms of the logical problem of induction. The fact is that, in most cases, no amount of observation can guarantee that a disconfirming instance of an inductively derived generalization will not arise in the future. To take the swan example, zoologists were fairly certain that all swans were white until they went to Australia and saw black swans. This is a problem for big data sets because they frequently generate difficult-to-test hypotheses. If an effect can only be seen in a data set derived from a million people, then it is unlikely that one will gather another million people for a prospective experiment to confirm that correlation. Induction cannot give sure knowledge. At a more theoretical level, even in the midst of debates within the Vienna Circle of logical positivists, there was a recognition by Otto Neurath that the language used for observation was highly theory-laden (Neurath 1983). To classify a creature as a swan already assumes quite a bit of theory.

The problems with the algorithmic model of science come into sharper focus when we consider the older model of scientific judgment that it attempts to replace. As philosophers of science like Pierre Duhem and Karl Popper have argued, scientific practice relies on deductive reasoning (Duhem 1954; Popper 2002). The researcher forms a generalization, a theory, and then deduces testable hypotheses from that theoretical statement. These hypotheses are then tested in experiments that can possibly falsify the theory. If the theory does not withstand the experiment, then the scientist is forced to develop a different one.

Importantly, Duhem and Popper both recognized that neither the creation nor the choice of a theory is a process that proceeds by a straightforward method like an algorithm. The formulation of a new theory is a creative act, the development of a new insight into the reality of the world, one that accounts for the data. The same need for judgment is also apparent in choosing between theories. As Duhem and, later, Thomas Kuhn and Imre Lakatos argued, it is not always clear what to do when an experiment seems to falsify a theory (Duhem 1954; Kuhn 1996; Lakatos 1974). Scientific theories are not simple propositional statements. Every theoretical statement depends on a complex whole that includes many equations, definitions, and interlocking concepts. Therefore, if an experiment fails, it might not invalidate the theory, for many possible reasons. The experimenter could decide to tweak some auxiliary aspect of the theory or introduce some new variable. For example, cosmologists discovered that the equations for motion do not provide accurate predictions of large-scale movements of things like galaxies and distant light. However, instead of rejecting current theory, they postulated mysterious entities like dark matter and dark energy to explain the discrepancy. Though so far unobservable, their theoretical equations predict that these entities must make up the majority of the universe. The cosmologists are making a judgment call.

Further, in practical terms, every test of a theory depends upon instrumentation whose efficacy depends on other kinds of scientific theories. If an experiment seems to disprove a theory, a scientist might justifiably argue that there are problems with the instrumentation used in the experiment. These kinds of contested claims over instrumentation become even greater when they involve the opacity of many machine-learning systems. In the face of failed predictions, researchers must choose between overturning a theory, modifying a theory, and rejecting the experiment on technical grounds. It depends on scientists’ judgments of the solidity of the theory, the availability of alternative theories that are nearly equally attractive, which theories are supported by dominant labs, and so on.

The theoretical aspects of scientific research depend on certain features of embodied human judgment. The ability to create a theory or to choose between theories requires a kind of feel for and immersion in the data, especially in biology. Michael Polanyi highlighted the importance of tacit knowledge in scientific reasoning, as in much of human life (Polanyi 1962). Not everything we know can be put into clear statements, or even thematized propositionally at all. Much scientific judgment depends on daily experiences in the lab dealing with organisms and experimental results. These kinds of experiences become sedimented in consciousness, shaping how a scientist theorizes and judges other researchers’ results. A scientist might doubt the findings presented in a paper, for instance, not because of any obvious problem with the data or its description, but because her tacit experience with the same kinds of manipulations makes her skeptical that such results are likely. Being in the lab opens her up to an intimacy with the experimental system. As Albert Einstein remarked, “There is no logical path leading to … [scientific] laws. They can only be reached by intuition, based upon something like an intellectual love (Einfuhling) of the objects of experience” (Popper 2002, 8-9). Similarly, Evelyn Fox Keller has described a feeling for the organism as critical for biological judgment (Keller 1983). Science requires a foundation of practical experience that undergirds its practitioners’ prudential judgments.

The need for familiarity with research tools even obtains with mathematical results. Max Weber recommended that sociologists hoping to make a theoretical breakthrough calculate their own statistics in order to gain a firsthand familiarity with the primary data sets. On a deeper level, though, purely mathematical theory will always hamper contact with reality. As Husserl (1970), the philosopher Jacques Maritain (1995), and Duhem (1954) warned, a mathematical description of the world is always an abstraction from reality. It requires projecting upon the world a geometric regularity, detaching from the messiness of materiality to transform objects into the symbolic form of numbers that can then be mathematically manipulated. While these numbers can give greater precision, prediction, and control, they do not necessarily give a better grasp of reality. Computational biology is a case in point. As the anthropologist Hallam Stevens notes, “[F]or the computer scientist, the point of his or her work is the elegant manipulation of data; for the biologist, the wet stuff of biology takes precedence over data” (2013, 54). The former prefers simulations of organisms, the latter experiments on actual organisms. They are not the same thing, which is why philosophers in the Aristotelian tradition have always put the truth value of predictive mathematical descriptions on a different plane from the description of essential qualities of beings found in natural philosophy (Maritain 1995). Mathematics is powerful because it projects a vision of reality upon the world that allows for predictive control. But it can stand in the way of accessing the layers of reality that reveal themselves through engagement with the world.

Current Problems

Although the foregoing description suggests many problems with a scientific practice based on the inductive power of big data, many historians of science dismiss both the fears and the hype surrounding big data. Historians like Lorraine Daston (2017) have shown that other eras of science have also had to organize and use large data sets. Big data is not a new thing, and scientists have always found a way to use these data well. Thus, these scholars suggest, we should temper both our hopes and our fears.

There is much truth to this observation, but the historian Bruno Strasser (2012), while recognizing similarities, has argued that there are three major differences between our current era of big data and previous ones. The first regards ownership. Current data sets tend to be kept proprietary by corporations or large labs, leading to a centralization of power in biology and restrictions on the essential flow of knowledge (Scherz 2019). Second, “the analysis of the data is carried out by researchers with different disciplinary backgrounds than those who produce it” (Strasser 2012, 86). This means that those doing the analysis typically have little sense for the quality of the kinds of data they are manipulating and lack the ability to contextualize the knowledge. As we have seen, the curators who impose theoretical structure on the data in genetics typically have little grasp of the contextual knowledge of the bench. Programmers designing the algorithms may have no experience at the bench either, leading to questions of the validity of their designs.

Finally, current big data are handled statistically. As Strasser notes, “Although data, anything from numbers to images, have generally been thought to refer to physical objects, recently they have increasingly come to stand for the physical objects themselves” (2012, 87). Older data sets tended to be actual organisms or descriptions of organisms. These new data sets tend to be purely digital archives abstracted from real organisms. A sign of this abstraction is the difficulty that these new data frameworks have in integrating images, which are essential for much of biological research and judgment. More and better images are collected now than ever before. The difficulty is that the qualitative nature of the images is hard to use in the quantitative framework now ascendant. Compared to the past, the quantitative frameworks lead to greater abstraction and thus distancing from the reality of the organism.

The danger in all this is that by misunderstanding scientific judgment, we will create policy and research frameworks that undermine it. If we believe that scientific judgment is merely the same as algorithmic method, then less and less emphasis will be placed on deep engagement with organisms and on developing expertise in bench techniques during training. Moreover, the built-up expertise of those who have worked with an experimental system will be undervalued. We already see this happening. One of the complaints raised about current systems of science funding is that there is no mechanism to allow a person to have a long-term career at the bench (NIH 2012; Alberts et al. 2014; Scherz 2019). The scientific career track in biology tends to advance from grad student to post-doc to principal investigator (PI). Once a PI, the scientist is basically a manager and grant writer. Few successful PIs spend any time at the bench, so their embodied expertise is not put to use. Researchers are spending more years as post-doctoral fellows, but these are temporary and relatively low-paying positions, so eventually people who cannot get a full-time appointment drop out of the system.

As many commentators have noted, this situation is a waste of educational investment and a human tragedy, but it is also a waste of valuable bench experience, a loss of embodied abilities to judge a research system. As soon as someone becomes a master of experimental technique, they are forced to leave their domain of mastery for management or another field. This was not the case in prior generations of scientific organization in which faculty would continue to engage in laboratory work and career trajectories were far more stable. Further, fewer people are likely to gain the feeling for the organism in the first place, as government grants continue to value technological spectacle over older forms of benchwork. Experts in quantitative analysis continue to enter the field, and their training in organismal biology is often limited. Many of the big labs that train most of the graduate students are now specializing in combing through data. The turn to data-driven biology is already affecting the capacity for theoretical judgment of future generations of researchers.

Conclusion

While the turn to quantification in biology has yielded few results, it is extracting a high and self-reinforcing price. Despite the dreams of science managers, policymakers, and technology enthusiasts, data cannot be made to speak for itself, even with current advances in machine intelligence. The data-driven research paradigm represents a dramatic loss of faith in the human ability to describe and understand the world. It was this confidence in the possible correspondence between human reason and the rationality of the universe that has always driven natural philosophy and science. Scientific progress requires human cognitive capacities that artificial intelligence lacks, like creativity and judgment. In biology, these capacities are supported by a long period of embodied work at the bench with organisms. The most important kind of knowledge for scientific discovery in biology may in fact be tacit. While machines can support some aspects of scientific analysis and big data may be able to provide some raw material to support new hypotheses, they will only ever be adjuncts to human creativity. The danger is that the reliance on machines will undermine proper training and shift decisions to curators and data scientists who lack the crucial tacit knowledge.

We are losing the accumulated knowledge and mature scientific judgment that come with the embodied experience of working at the bench. These are lost when the postdoctoral fellow completes her train of fellowships and continues no further. They never develop when newcomers to the field are not initiated into the tradition of bench science and are trained at a desk rather than in the lab. Organizational changes can address some of these problems—many commentators, for example, have suggested funding more lab scientist positions so that researchers can pursue long-term, stable careers. However, these problems will not be solved until we have addressed the foundational misunderstanding of scientific judgment.

The former paradigm of doing science requires an infrastructure and a culture of the laboratory that allow for the requisite apprenticeship and engagement. This infrastructure is disappearing as more and more resources go to the now-ascendant focus on big data. Data-driven biology is not just a conglomeration of new tools. It is an entirely different paradigm for what counts as scientific knowledge and how research is done. Policymakers have difficulty supporting two paradigms of scientific discovery at the same time. Choices must be made over which laboratories and projects should receive government grant funding. What we risk losing through the current embrace of data-driven biology is not merely time and resources. We risk losing the infrastructure necessary to support the older paradigm of research. As more investment and grant funding flows to the machine infrastructure, less is available to support the career trajectories of bench scientists. This trend is especially troubling because the development of scientific judgments depends on a stable tradition of apprenticeship in the laboratory. Once the institutional structures supporting this older form of embodied judgment are lost, this form of judgement will not be easy to regain.