Chromatin Evolving

Gregory A Babbitt. American Scientist. Volume 99, Issue 1. Jan/Feb 2011.

In 1897, the American cytologist Edmund Beecher Wilson of Columbia University published his great work, The Cell in Development and Inheritance. Wilson synthesized much of what had been previously observed and deduced about the hereditary role of the cell nucleus in the nearly 40 years prior, to the rediscovery of Gregor Mendel’s work, which in turn would lay the foundations of modern genetics. After the independent rediscovery of Mendel’s laws of inheritance in 1900, Wilson’s colleague, Nettie Stevens, along with his students, Walter Sutton and Thomas Hunt Morgan, would go on to empirically cortfirm the role of the chromosome in both sex determination and the transmission of heritable information in living organisms. In later years, Morgan’s own students, Alfred Sturtevant and Hermann Müller, would become the first to map genetic mutations to chromosomes and to artificially induce mutations in the lab.

Wilson’s masterful book combines meticulous observations of the microscopic activity of cells with logical insights derived from simple manipulations of cell specimens. It thereby reflects a unique period when biology was in transition from a science based largely on descriptive observation to a science grounded in carefully planned experiments. One of the most remarkable insights that Wilson derived from the reports of early experimentation was the conclusion that the “idioplasm,” now called chromatin, is the physical seat of heredity. Chromatin is the complex of DNA and protein molecules that comprises chromosomes. Equally visionary was Wilson’s portrayal of chromatin as an active, dynamic substance in the cell nucleus. Cytologists of Wilson’s era observed clearly that chromatin moved about in the nucleus and that prior to cell division, chromatin changed radically, condensing from a diffuse mass into easily visualized compact threads. Wilson reproduced diagrams by the Italian cytologist Galeotti that documented further profound changes in chromatin in response to disease and environmental toxins.

The dynamism of chromatin is sometimes forgotten in light of textbook portrayals of the chromosome as a static library in which the all-important genes are fastidiously shelved. In fact, acknowledging the dynamic nature of chromatin seems, in all eras, strangely at odds with chromatin’s potential role in heredity. How can something so apparently mutable on a cellular rimescale also exist relatively unchanged over vast tracts of evolutionary time?

Interpretation of the role of chromosomal dynamics in routine cellular functioning was largely set aside by biologists in the mid-20th century as they fixed their attention on the molecular basis of heredity, the new field laid open by the discovery of the chemical structure of DNA in 1953. A year earlier, Alfred Hershey and Martha Chase had demonstrated that nuclear protein was not the hereditary material and that DNA was the carrier of heritable information. Many of the experiments that elucidated the role of DNA were performed on bacteriophage, or phage – viruses that affect bacteria. The DNA of phage is not associated with proteins that package and control access to the DNA, and this contributed to the tendency to overlook chromatin structure as an integral element of genetic control. There was continuing interest in discovering the higher orders of structure of eukaryotic nuclear DNA – how was it wrapped, or stacked, or coiled at the highest levels – but for many years there were no answers, because the structural motifs of chromatin are far too small to image in detail by microscopy, yet they are not amenable to visualization by techniques such as x-ray crystallography, the workhorse in the era of visualizing macromolecules.

Even the definitive work of François Jacob and Jacques Monod on the regulation of genes in bacteria helped propagate a textbook view of gene regulation in which chromatin structure made no contribution. According to the model of Jacob and Monod, genes are turned on and off by protein factors that bind directly to DNA sequences, thereby controlling adjacent genes. The picture seemed simple and complete without searching out a role for chromatin structure. Attention moved instead to the search for genecontrolling protein factors.

Finally, the deceptively simple structure of the DNA molecule – a regular double helix, with all of its variability apparently confined to the sequence of its nitrogenous bases – seemed to imply, as Francis Crick noted in a 1953 letter to his son Michael, that the principle role of the cell nucleus was to store a large amount of digital information in the form of a four-letter cipher, and that it was the coding feature of the genetic material and not higher levels of molecular structure that determined how “life comes from life.”

Abasie problem with this view, especially regarding cells that package their DNA into chromatin, is that the structural and biophysical features of DNA and DNA-protein complexes could hardly fail to have some effect on the expression of the information in DNA. The question is how much. Finding the answer requires that we look not just at the gross levels of chromosomal superstructure, but at the finest degrees of chemical and spatial distinction conferred, for example, by bends and twists in DNA, which pinch and widen the spiral grooves on the surface of the molecule and modulate the strengths of interacting charges along the molecule. Both effects change the binding properties of DNA.

Modern biologists are particularly intrigued by the dynamic molecular functioning of the nucleosome, the basic building block of chromatin. Nucleosomes are formed when DNA wraps around a core particle of histone proteins. Most DNA in the cell is packaged in nucleosomes. Yet we still do not fully understand some of the most fundamental properties of the nucleosome, including the way its position on DNA is determined, its roles in gene regulation and its potential impact on the evolution of regulatory systems in organisms. What is becoming clear is that, as first surmised over 100 years ago, a more complete picture of chromatin function at the largest and smallest scales will be critical in advancing our understanding of current questions in biology.

Looking Closer

Appreciation of the roles of DNA shape, structure and binding properties in the control of gene activity is growing. Recent work by Remo Rohs and his colleagues in 2009, focusing in acute detail on the biophysics of DNA-protein interactions, demonstrated that DNA sequences containing short repeated runs of adenine (A) or its complementary partner thymine (T) actually have a slightly narrower and therefore suffer helical structure. When DNA containing these short A tracts is deformed by bending, as required by most DNA-protein interactions, including those that occur in the formation of nucleosomes, the narrowing of the DNA’s minor groove focuses electrostatic potential in a way that attracts the positively charged amino acid arginine. This attraction seems to explain how many transcription factors, which are often arginine-rich, can find their proper binding sites without reading base by base through the DNA sequence. Rohs and his colleagues also show that the same mechanism is active in the positioning of nucleosomes on DNA sequences. It is important that biophysical investigations like these on the fine structure of DNA receive attention alongside the recently ascendant bioinformatics revolution that stresses statistical analysis of DNA sequences.

Until very recently the nucleosome itself, formed by the tight winding of DNA around the highly evolutionarily conserved histone proteins of the nucleosome core, was regarded as simply passive “packaging” in which the information encoded in DNA could be comfortably warehoused until needed. Transcription factors, it was assumed, simply displaced the nucleosomes when it became time to access the critical regulatory sites on DNA. The study of transcription-factor binding has itself, until recently, been largely limited to this perspective, focused primarily on identifying base sequences that define particular binding sites instead of reflecting a more complete biophysical perspective that examines the propensity of a protein to interact with particular DNA sequences in terms of the molecular dynamics and binding energetics of both binding partners. In the ever-expanding ocean of DNA sequence data that has come to characterize our post-genomic era, we are relearning that DNA is a biophysical structure, with important implications for the nanometer-scale interactions between DNA and protein that underlie the regulatory networks of the genome.

Nucleosome Positioning

DNA is among the stiftest of biological molecules. The spiral winding of the double helix confers some rigidity, and the like-charged groups arrayed along the spine of DNA repel each other, contributing to the tendency of the DNA molecule to straighten. Biophysical calculations indicate that under physiological conditions of pH and salt concentration, a genome-sized length of unspooled DNA would not collapse into a minimal heap like thin sewing thread dropped on a tabletop; it would instead form a diffuse volume, more like a bushy mass of nylon fishing line. The volume of the mass would be many thousands of times the volume of the cell in which it must reside. The biological solution to this problem in most larger genomes is the spooling of DNA in snugly wound nucleosomes.

The nucleosome consists of two pairs of four types of histone protein (H2A, H2B, H3 and H4) forming an octamer, or eight-unit histone core, that is wrapped almost twice by a 147base length of DNA. The molecular structure of the nucleosome was determined in 1997 by Karolifi Luger at the Institute for Molecular Biology and Biophysics, ETH Zurich. Since then, slightly higher-order structures of chromatin consisting of multiple histones have been determined by others in the field, although we still cannot confidently describe the precise geometry of the cables of DNA known to exist in the cell, formed by millions of histones in a tantalizingly as-yetunseen array.

Whereas primary chromatin structure is defined by the sequencedependent positioning of nucleosomes on DNA, the next level of chromatin structure is likely determined by the interactions of adjacent nucleosomes – interactions that are controlled through chemical modification of the histone molecule itself. In particular, the H3 and H4 histones have long tails that interact with the DNA sequences on the outside of adjacent nucleosomes. The acetylation of specific amino acids on the H3 and H4 histone tails tends to promote dissociation of tight nucleosome assemblies, resulting in open chromatin that is conducive to gene activation (by allowing transcription factors access to the DNA). Methylation of other sites is associated with closed chromatin structure and repressed gene activity. Histone regions within the nucleosome core are also subject to specific modifications, which produce histone variants known to influence gene regulation.

It has been recognized for several decades that the interaction of DNA with the nucleosome core seems to favor periodic sequence patterns of 10 to 11 bases; these patterns seem to facilitate the pronounced bending of DNA around the rim of the nucleosome. A series of elegant analyses on experimentally induced nucleosome bound DNA fragments from the laboratories of Jonathon Widom of Northwestern University and Eran Segal of the Weizmann Institute in 2006 helped to experimentally confirm and further define patterns of nucleosome positioning on DNA. In this work, the main periodic pattern appeared to consist of certain adjacent base pairings of dinucleotides (AA, TT and TA) in the DNA sequence that were favored where DNA sequences contact the surface of the histone core.

It is now clear that the natural 10.4base period in the helical structure of DNA helps impart these 10-11 base pair periodicities, which somehow favor the deformation of DNA to the histone surface. Biophysicists are currently exploring the molecular forces in action during nucleosome formation and have even developed models that predict with considerable precision the energies required for any given sequence to deform to the histone core. What is truly remarkable in all this work is that nucleosome formation and positioning appears to be highly dependent on these biophysically favored patterns in DNA sequence, suggesting that the sequence of DNA actually encodes its own packaging into chromatin.

Inferring the Evolution of Chromatin

As molecular evolutionists, my postdoctoral advisor, Yuseob Kim, and I working at the Biodesign Institute of Arizona State University, in collaboration with biophysicist Michael ToIstorukov at Harvard Medical School, were particularly excited about this finding. It suggested that we might be able to pinpoint natural selection acting directly on DNA sequences related to chromatin organization in the genome. The fact that the biophysical aspects of chromatin structure appeared to be sequence dependent strongly suggested that chromatin structure itself is directly subject to the forces of evolution. We felt that this represented a new calling for those of us working in the field of molecular evolution – we would devise novel statistical methods for inferring the action of natural selection on DNA sequence patterns in terms of how these patterns affected chromatin structure. In 2008, we published our first analysis of the molecular evolution of chromatin organization in a simple unicellular eukaryote, the yeast Saccharomyces.

Evolutionary inferences about chromatin should generate predictions about the formation of nucleosomes on given DNA sequences. Our original methods relied on correlating given sequences with patterns of nucleosome formation as defined by a large library of nucleosome-bound fragments of DNA. In our more recent work, we use computational modeling of the deformation energies required for specific sequences to conform to the molecular structure of the nucleosome core. We have demonstrated that natural selection appears to have acted at the molecular level to conserve sequences that affect nucleosome positioning in the regulatory regions of genes. Subsequent work has demonstrated that the binding sites of many transcription factors in yeast appear to have evolutionarily conserved chromatin contexts that may play significant roles in the subsequent spatial and temporal dynamics of gene activity.

Higher Levels of Structure

The difference in size between the fiber formed of bunched nucleosomes and the giant condensed chromosome that becomes visible before cell division is huge – three to four orders of magnitude. Yet we know very little about the tiers of structure that are surely maintained within the condensed chromosome or in the uncondensed chromatin. Are the fibers arranged in loops, or coiled like rope? We simply don’t have a clear picture. However, we have recently learned a great deal through a series of linked experiments in several labs. We don’t yet have a comprehensive three-dimensional structure of every coil of the genome, but in the last year we have acquired a compelling overall picture of the rules the genome follows as it folds.

The original technique of chromosome conformation capture – “3C” – was devised in 2002 in the lab of Job Dekker at the University of Massachusetts. Addition of formaldehyde forms crosslinks between regions of genomic DNA that are close to each other in space, even if they come from remote parts of the same chromosome or from different chromosomes. After digestion of the DNA into fragments, the linked pieces are recovered and analyzed, which generates information about how local regions of the genome were folded.

The next innovation was to circularize and add primers to the recovered fragments of DNA, greatly facilitating amplification and subsequent analysis. Circularized 3C, called 4C, led the way to 5C from the Dekker lab, or carboncopy 3C, which added massive parallel analysis to amplification. These computationally enhanced methods can identify all regions of the genome that are close to each other in space, providing a pointillist snapshot of global genome folding.

To incorporate next-generation DNA sequencing, the Dekker lab in partnership with Eric Lander of the Broad Institute of Harvard and MIT devised the Hi-C technique, in which markers added before purification of the crosslinked DNA permit the creation of a contact map showing which sequences are near other sequences over the entire genome. Analysis of the whole-genome map of interactions, using polymer theory and computer simulations, reveals much about how the 2 meters of DNA in every human cell is folded to fit within a 5-micron nucleus. Probability analysis revealed that gene-rich regions on different chromosomes interact preferentially. Hi-C data also suggested two distinct “compartments,” one more compact, the other more open, with the open compartment likely corresponding to actively transcribed chromatin.

Further analysis, performed in collaboration with the biophysics lab of Leonid Mirny at the Harvard-MIT Division of Health Sciences and Technology, assessed the probabilities of contacts between pairs of loci separated by varying genomic distances. From the emerging statistical model, a structural model developed by computer simulations in the Mirny lab took on detail. This was capped by a cunning deduction about the overall state of packing. Different predictions are expected about the content of the contact map if the packing of chromatin is as random as spaghetti – referred to as an equilibrium globule – or if it is packaged in a series of tight space-filling curves, with small stretches forming globules that abut other globules to form globules of globules, and so on at higher levels of organization, a polymeric state called a fractal globule. Such a structure was first proposed about 20 years ago by Alexander Grosberg and colleagues. Computer simulations based on Hi-C data indicate the presence of fractal globule organization in chromatin. Simulations further indicate that the lack of tangling and the hierarchy of globules would facilitate a more dynamic chromatin, easily unfolding to permit access to interior regions.

The Evolution of Gene Regulation

Evolution on the molecular or sequence level can be classified as either structural or regulatory in its effects. Structural evolution refers to DNA mutations that ultimately affect protein structure. By definition these changes are restricted to the coding regions of DNA. Mutations on the third base positions of codons (the three-base words that code for specific amino acids) generally do not alter the amino acid that is placed in the protein sequence – these mutations are generally silent; they have no effect on protein structure. Mutations at other positions in the codon are not silent. This consistent feature of the genetic code has allowed molecular evolutionists to infer that natural selection has affected protein structural evolution simply by examining the ratio of silent to non-silent mutations and then comparing this ratio to the result we would expect if selection were neutral. However, inferring the action of selection on noncoding regulatory regions of DNA represents a much more challenging task, as noncoding DNA lacks a convenient marker like the promiscuous third base position of codons.

The functional suitability of regulatory DNA sequences is ultimately determined by their ability to physically conform to the binding sites of transcription factors that modulate gene expression. The biophysics of these interactions must compete with biophysical forces driving nucleosome formation – transcription factors compete with nucleosomes for access to these regions of DNA. While this implies a generally repressive role for chromatin, some strongly positioned nucleosomes have been shown to configure binding sites in ways that make them more recognizable to transcription factors. Recent studies have also shown that all genes are flanked with nucleosome-free regions of DNA that are suffer than average and not as easily wound around nucleosome cores. Located just up- and downstream of coding regions, these open segments of DNA allow easier access to transcription factors. In fact, according to a recent model that has received much attention, they may contribute to long-lived assemblies called transcription factories. These either develop on activated genes or activated genes migrate to them to be transcribed.

Future methods for inferring the molecular evolution of regulatory sites in the genome will eventually need to address all of the complex molecular interactions involved. This can be done by utilizing statistical and biophysical modeling. We will then be better able to explore the vast areas of noncoding regulatory DNA, dubbed the “dark matter” of the human genome, using the same currency that quantifies its basic functional organization – the biophysical properties that ultimately determine whether the DNA polymer can deform to the structure of the nucleosome core, thereby controlling access to any information the DNA might contain.

Where Gene Meets Environment

In most multicellular organisms, the process of development is utterly dependent on the precise timing and control of gene expression. Many key developmental genes are only activated in the growing embryo at temporally and spatially precise moments. Until now, most of the research on the evolution of gene regulation has focused on the molecular control of basic patterns in the development of overall body plans, the field known as evo-devo. Once a developmental stage is complete, the most important function of gene regulation is to enable the induction of particular classes of genes in response to changes in environmental conditions.

Much of early animal development occurs in a buffered, protected environment such as a uterus or an egg. Later growth and development is typically much more sensitive to external conditions, such as the health and nutrition status of the mother for nursing infants or the various stressors of the environment. It is a common misconception that our phenotypes – our measurable morphological or physiological traits – are primarily the products of either our genes or our environment, the so-called nature-versus-nurture dichotomy. Like so many dichotomies, this one is false. Complex phenotypes are always the developmental product of the many inducible genes acting during development in a given environment, a gene-environment interaction that is combined with a sometimes surprising level of random biomolecular noise.

Organismal biologists mapping genotype to phenotype must remain aware that nature, nurture and chance are never completely separable in the development of a given phenotype. If they were to ask where these interactions take place in the cell, the answer would be in chromatin. At its most primary level, nucleosome formation is sequence dependent and so, by its very definition, should be heritable. In a study published in Science magazine in April 2010, Ryan McDaniell and colleagues reported the discovery of heritable individual and allele-specific chromatin signatures in humans. In the same journal the following month, Shahaf Peleg and coworkers reported the involvement of chromatin modification (altered histone acetylation) in age-related memory loss in mice.

Studies such as these have demonstrated that chromatin is both highly stable (thus heritable), and therefore relatively permanent on evolutionary timescales, and yet also highly dynamic, subject to environmentally induced change in the individual. Because chromatin structure is at once stable and dynamic, or “quasistable,” it serves as the primary interface for the meeting of gene and environment, and it is the most obvious place to look for evidence of the evolution of gene regulation.

But how exactly is this dual role of the chromatin achieved at a molecular scale? Because fine-scale chromatin organization, defined by nucleosome positioning on DNA, is so dependent on the composition and spatial patterns of nucleotide sequences, heritable and thereby evolvable chromatin signatures are probably also largely confined to the scale of direct physical interaction with DNA. However, because chromatin is also reversibly modifiable at higher levels of structural organization, such as through chemical modifications of the histone tails, the nonheritable and dynamic component of chromatin is probably a characteristic of its higher-level organization, defined at molecular scales more distant from direct biophysical influences of DNA sequences. Ultimately, it is this hierarchical, layered and quasistable quality of chromatin that allows for the complex duality of its behavior. Thus, while research in recent years has demonstrated that at the primary level of the nucleosome, DNA can actually heritably encode for its own packaging, there has also been an increasing understanding that the higher-order structure of chromatin is more plastic, subject to experiential and age-related remodeling during the course of a lifetime.

The simple fact that the incidence of most human diseases increases with age suggests that changes in chromatin-mediated gene regulation is likely to be a common component of disease. An initial step in investigating chromatin’s role in disease will be determining how the organization of chromatin actually affects gene regulation. This will require novel tests of molecular evolutionary inference, which can show us where chromatin function has been evolutionarily conserved between closely related species. Given the complexity of chromatin dynamics, such tests will probably need to be grounded in biophysical modeling of the molecular interactions that define the bending and binding of DNA to both transcription factors and histones, the two protein classes that govern gene regulation most directly. As more human genomes are sequenced, the disease associations of functionally important regions of evolutionarily conserved chromatin will be further investigated, not simply in terms of DNA sequence polymorphism, as in present-day genome-wide association studies, but ultimately in terms of the age-related chromatin modifications that accumulate during a lifetime. Hope lies in the fact that, unlike damage to one’s DNA, changes to one’s chromatin are often quite reversible, a property imparted by the very nature – stable yet dynamic – of chromatin’s complex structure.

Research in the area of chromatin biology has seen a massive upsurge in the last few years. The new transdisciplinary field of epigenetics, encompassing any heritable change in phenotype or gene expression that is not directly caused by a change in DNA sequence, is now a major funding directive at the National Institutes of Health. The physical mapping of nucleosome positions in the human epigenome is rapidly becoming a reality. But this modern task has its roots long ago in a time when most biologists had not yet rediscovered Mendel’s laws of inheritance nor even established a vocabulary for our modern concept of the gene. Edmund Beecher Wilson wrote about the importance of the chromatin function and evolution in defining the boundary of genotype and phenotype in organisms:

The idioplasm [chromatin] of every living species has been derived, as we must believe, by the modification of a preexisting idioplasm through variation, and the survival of the fittest. Whether these variations first arise in the idioplasm of the germ-cells, as Weismann maintains, or whether they may arise in the body-cells and then be reflected back upon the idioplasm, is a question on which, as far as I can see, the study of the cell has not thus far thrown a ray of light. Whatever position we take on this question, the same difficulty is encountered; namely the origin of that coordinated fitness, that power of active adjustment between internal and external relations, which, as so many eminent biological thinkers have insisted, overshadows every manifestation of life. The nature and origin of this power is the fundamental problem of biology.