Combing Chromosomes

John Herrick & Aaron Bensimon. American Scientist. Volume 89, Issue 3. May/Jun 2001.

Beneath the surface of a stream, the hair on a swimmer’s head floats freely, sometimes in turbulent tangles. But as the swimmer emerges above the surface, her hair is perfectly arranged against her head, each strand guided into place by the combined pull of the water’s surface and gravity.

It is hard to imagine applying the same principle in the quiet of the laboratory, but in fact a similar process, implemented on a much smaller scale, arranges chromosomes in straight lines on pieces of glass. Usually coiled and twisted, DNA can also be “combed,” giving scientists a clear view of important details usually hidden.

In the 1950s, scientists could observe DNA at the level of an organism’s entire genome, its complete complement of chromosomes. Stains create banding patterns on the chromosomes, which can then be identified by relative size and individual features. This technique, known as karyotyping, was capable of revealing some genomic abnormalities, especially those involving the number of chromosomes. For example, a person affected by Down syndrome has three-not the usual two-copies of chromosome 21. Nevertheless, the size and banding of chromosomes cannot expose finer details, such as the precise location of a specific gene.

Greater resolution emerged with the development of a technique called fluorescence in situ technology. In effect, this technique “paints” an entire genome with DNA probes-sequences of nucleotides (the chemical units that make up DNA and the gene-transcription molecule RNA) that are complementary to a target. The probes can be detected and visualized with fluorescent antibodies. A recent approach called spectral karyotyping can coat a genome in as many as 24 different colors to simultaneously distinguish multiple regions of a genome. Finally, one can detect the amplification or deletion of a given sequence, as indicated by changes in fluorescent intensity, using still another approach called comparative genomic hybridization. Although powerful, these methods reveal only large anomalies, because they provide resolutions of just 1 million to 10 million nucleotide bases. Recent advances in fluorescence in situ technology, however, can increase the resolution to about 5,000 bases.

Each human cell includes about 30,000 to 35,000 genes, arranged on 46 chromosomes. All told, those chromosomes consist of about three billion nucleotides. Laid end to end, they would form a polymer roughly two meters long. That two meters’ worth of chromosomes, however, is packed inside a cell’s nucleus, which is only a micrometer or so across. Once a chromosome gets sufficiently wadded up to fit in such a small space, getting a view of most sections verges on the impossible. Finding genes is even more difficult, because they lie camouflaged among noncoding sequences that might make up as much as 97 percent of the DNA. Even if scientists have nearly “read” the base-by-base sequence of the entire human genome, it will take a great deal of work to distinguish functional elements and understand individual differences in the genome. For example, distinguishing the coding from noncoding DNA alone will be a formidable challenge. In the meantime, scientists construct detailed physical maps of specific subregions of the genome. These maps typically cover sequences of a few million bases, and they are usually devised to precisely locate a specific gene, usually one associated with a disease or other characteristic.

Accordingly, scientists continually seek a better view of DNA, one that provides a relatively wide perspective with a reasonable resolution. We shall describe an approach called molecular combing, which neatly arranges chromosomes in straight lines along a surface. Specific sequences of a stretched chromosome can then be marked with fluorescent labels to provide landmarks, so to speak.

This combination of techniques is already being applied to many problems. Beyond its applications to genomic studies and genetic diseases, it creates new experimental possibilities for cancer research. We have demonstrated that this is an attractive approach to the study of the phenomena underlying the mechanisms of carcinogenesis, which include microdeletions, inversions and amplifications of specific genetic loci. We intend to develop a more general procedure that may contribute to an understanding of the genetic reasons for tumor development and allow scientists to follow its evolution in time.

Foreseeable applications in other domains are equally promising. Indeed, as a tool, molecular combing is a versatile approach to a wide range of subjects and questions of fundamental interest. This is especially true for the multifaceted domain of DNA replication in eukaryotes, and higher eukaryotes (including human beings) in particular. In these cellular systems, DNA replication is a complex and highly ordered process involving the sequential duplication of alternating regions of the genome. How this process is organized and regulated is one of the outstanding questions in biology.

Stretching the Strands

Visualizing individual molecules of DNA for both genetic and physical studies involves two basic steps that often raise technical difficulties. First, an investigator must attach the molecule to a solid surface. Second, the molecule must be extended so that its characteristics can be studied. This turns out to be far from a simple task.

To attach a chromosome, early investigators started by modifying one end to make it “stick” to a solid surface. Some of the earliest efforts relied on a probe composed of an oligonucleotide, or chain of nucleotides, and a nonfluorescent label called digoxigenin. Digoxigenin is a steroid that comes from the blossoms and leaves of Digitalis purpurea and D. lantana, which gardeners might know better as foxglove. To capture the digoxigenin-labeled DNA, investigators treated a surface with anti-digoxigenin, which is an antibody that binds to digoxigenin. As a result, a digoxigenin-anti-digoxigenin interaction bound one end of the DNA to the surface. A magnetic bead attached to the other end of the DNA allowed it to be extended by applying a magnetic field. Investigators studied individual molecules with this method. To extend large numbers of attached molecules, some investigators used the force of a DNA solution flowing down a cover slip. These methods suffered from a crucial drawback: The DNA could not be extended in predictable ways because the force on it could not be controlled consistently. The resulting irregular stretching made labeling studies less effective, as less of the molecule was available for genetic analysis.

In 1994, one of the authors (Bensimon) and his colleagues found a new method that uniformly extends and aligns large numbers of molecules. The technique, molecular combing, relies on the action of a receding interface between air and water, or meniscus. This process operates like water straightening your hair as you raise your head above the surface while swimming. Molecular combing came from a search for a way to bind DNA alone to a chemically modified glass surface. Such a physico-chemical attachment would obviate the need to modify the DNA itself-by attaching an oligonucleotide, for instance. Bensimon’s group found that the ability to bind DNA to a glass surface depends strongly on the pH of the DNA solution with optimum binding at a pH of 5.5. Under that condition, one end-and sometimes both ends-of a molecule of DNA binds to a cover slip, and the rest of it remains randomly coiled. Lifting the cover slip vertically out of a DNA solution, however, “combs” the molecules as the meniscus passes them, leaving the DNA in linear form on the cover slip.

The mechanics of the stretching process remain somewhat mysterious. We do know that the force of surface tension at the meniscus is two orders of magnitude stronger than the forces of entropy that act to keep DNA in a random coil in solution. So the passing meniscus converts the random coil to a linear form. At the same time, the forces that anchor the DNA to the surface exceed the force acting at the meniscus, so the DNA remains attached. In fact, the attaching bond is strong enough to resist the force that develops when the DNA is stretched up to 1.5 times its natural length.

Altering the Alignment

As we soon found out, however, aligning lots of DNA all at once takes practice. We started by placing a small drop of DNA solution, only about 5 microliters, on a silanized-glass (glass coated with silane, a silicon compound) cover slip. Then, we placed an untreated cover slip on top to spread the drop to a thickness of about 20 micrometers. Looking at this solution with a microscope, we saw that some of the DNA bound to the treated cover slip. Then, we waited, simply letting the DNA solution evaporate. As the solution evaporated, a meniscus crept between the two cover slips, leaving fully stretched DNA in its wake. Although this technique revealed many secrets about stretching DNA, it didn’t stretch enough of it.

We did learn that the experimental conditions affect the force on DNA as the meniscus passes. This must be controlled carefully, because the application of too little force will fail to extend the DNA, and too much force will break it. If the force exceeds about 160 piconewtons, or 160 trillionths of a newton, it breaks a molecule, but that is more force than is necessary to extend a molecule to a significant fraction of its length. Although it’s hard to imagine just how small of a force that is, here’s a comparison: You can hardly feel a housefly resting on your arm, but it produces a force of about one-hundredth newton, which is about 6.25 billion times more force than what can be applied when combing DNA.

Many experiments revealed to us that the magnitude of the force created by the meniscus depends on the surface on which the DNA is being stretched. For example, with a digoxigenin-coated surface, the force exerted by the recession of the meniscus is about 54 piconewtons. In general, hydrophilic surfaces, including glass and polylysine, produce little stretching, less than 0.25 times the DNA’s normal length. Hydrophobic surfaces, on the other hand, create optimum stretching. Indeed, DNA can be combed on a variety of hydrophobic surfaces, including graphite, polystyrene, silanized glass, Teflon and several others. Different surface treatments produce considerable variation in the stretching constant. On hydrophobic polystyrene surfaces, for example, the meniscus stretches molecules up to twice their crystallographic length.

With practice, we learned to stretch DNA just right and in large amounts. Bensimon and his colleagues constructed an apparatus that mechanically extracts a cover slip from a well that contains a DNA solution. A well contains from 0.5 to 20 milliliters of solution and up to 10 micrograms of DNA. We incubate a cover slip in the solution for about 5 minutes to allow spontaneous binding between the molecules and the surface. Then, an elevator extracts the cover slip from the solution at a constant rate of 100 to 200 millimeters per second, and the bound DNA is reeled out over the surface. Typically, this approach combs several hundred haploid genomes on a single 18-by-18millimeter cover slip.

Adding Markers

Sometimes new technologies can breathe new life into old techniques. We hoped that combing might allow scientists to get much better information from relatively old standby techniques such as fluorescence in situ hybridization. Recall that probes of complementary DNA can be attached to a region of interest in an immobilized DNA molecule. To make this process work, the immobilized DNA must be denatured, which unravels some of its double helix into single strands. Then, in a process called hybridization, the nucleotides of the probe bind to their complementary bases– adenine binds to guanine, and cytosine binds to thymine-in an exposed single strand of the immobilized DNA. These probes include a tag, such as digoxigenin or biotin, that can be detected subsequently with specific antibodies that indude a fluorescing compound. Then the region of interest can be visualized directly under a fluorescent microscope.

In the combing process, DNA fluctuates freely in solution on one side of the interface and is stretched and bound irreversibly to a dry surface on the other side. At first, the irreversible bond between DNA and a surface raised questions about its utility for enzymatic and hybridization studies. Now it appears that DNA does not attach along the entire length of the double helix, but rather at sporadic sites where the helix is partially open because of denaturing.

Bensimon and his colleagues first applied fluorescence in situ hybridization to combed DNA from the yeast artificial chromosome, which is an engineered version of portions of the yeast genome. The labels provide a measurement of the physical distances between the hybridization signals to a precision of a few thousand bases. The accuracy of any map of a chromosome depends on the precision of measuring distances between landmarks. For example, if you need to find a specific location in New York City, a fine-detail map that shows streets will serve you better than, say, a low-resolution one that simply marks the city’s five boroughs. Likewise, higher-resolution measurements from chromosomal probes will make it easier for investigators to locate specific genes.

The yeast experiments showed that combing can reveal the genetic structure of large regions of the genome. Moreover, the identical stretching of all of the DNA on a single surface allowed for accurate and reproducible measurements of distances between genetic loci. Indeed, this property makes high-resolution physical mapping on combed DNA possible, because it provides an exact correlation between the length of a combed molecule and its natural size. We showed that the measured length of a combed molecule correlates with its size in kilobases in experiments on DNA from the lambda bacterial virus, which was one of the first genomes to be completely sequenced. The length of DNA from lambda bacterial virus after combing was 25 micrometers, which converts to approximately 2 micrometers per thousand bases. Moreover, the crystallographic length of this DNA is about 16 micrometers, which indicates that combing stretched this DNA by a factor of 1.56 times its normal length.

In principle, a microscope’s optical power limits the precision of the measurements made on hybridization signals from combed DNA. For example, if the theoretical resolution of a microscope is 0.25 micrometer, then the precision of a measurement made on combed DNA is about twice that, or 0.50 micrometer. Nevertheless, actual experiments fall short of that resolution. To assess the actual resolution of fluorescence in situ hybridization on combed DNA, we used probes of known length that hybridized to adjacent sections of combed DNA and then measured the distance between hybridization signals. The results indicated that variability in hybridization efficiency reduced the overall resolution to about 2 micrometers, or roughly 4,000 bases.

Moving from small to large, Bensimon and his colleagues used DNA from the bacterium Escherichia coli to show that molecules of one million bases or more can be combed easily, making high-resolution optical mapping over long distances possible. In fact, distances from 4,000 to 200,000 bases can be measured between hybridization signals on combed DNA without any counterstaining. The effective range extends beyond this higher value and is limited only by the size of the field of view and the number of intact hybridized molecules that can be detected.

Combing for Clinical Anomalies

In many cases, subtle genetic alterations set apart genetic diseases. These anomalies often involve amplifications or deletions as small as a few thousand bases and frequently up to a few hundred thousand bases-none of which can be reveled by conventional cytogenetic techniques. Nevertheless, applying fluorescence in situ hybridization to combed DNA exposes alterations on this scale. In work done in collaboration with Sue Povey’s laboratory at the Medical Research Council/University College London, we first showed this in a genetic disorder called tuberous sclerosis, which triggers benign tumors throughout various organs, including the brain, eye and heart. Estimates suggests that it strikes about 1 in 6,000 people, causing symptoms ranging from skin problems to mental retardation. This disease arises from microdeletions in DNA.

We studied three patients with known deletions in the so-called TSC2 gene, which is located on chromosome 16. We combed the DNA and attached two probes that flanked the deleted region. The distance between the probes was 147,000 4,400 bases in people who are not affected by tuberous sclerosis. In unaffected individuals, one expects to observe a homogeneous population of signals; in individuals with tuberous sclerosis, one expects to find two populations of signals, one corresponding to the normal allele, or variable form of a gene, and the other corresponding to the allele that contains the deletion. The deletions in the DNA from these three patients were 69,400 4,600, 38,400 10,700 and 135,800 5,400 bases long. Those deletions are too small to be detected using conventional cytogenetic techniques. This result established the fundamental and clinical usefulness of our technique to detect and measure microdeletions in well-characterized genes. The quantification proves especially valuable, because the size of the deletion often correlates with the severity of the disease.

Other studies bring to light the broad potential of molecular combing in clinical applications. For instance, this technique exposed deletions in the dystrophin gene, which participates in the development of muscular dystrophy. It also helped investigators map breakpoints and rearrangements on chromosome 5 in a region implicated in a variety of malignant myeloid diseases. Deletions as small as 3,000 bases were detected in the breast cancer genes BRCA 1 and 2 in research done in collaboration with Dominique Stoppa-Lyonnet’s laboratory at the Institut Curie in Paris. Such discoveries push the current limits of applying fluorescence in situ hybridization to combed DNA, because the resolution cannot reveal deletions smaller than 3,000 bases. On the other hand, more indirect approaches indicate that sequences as small as 500 bases or less can be detected and mapped using molecular combing. Such high resolution suggests that exons-regions of DNA that code for proteins-can be mapped in a region containing a gene, and hence the structure of the gene itself can be elucidated.

Fluorescence in situ hybridization on combed DNA also detects and quantifies amplified sequences ranging in size from 50,000 to more than 50,000,000 bases. In such cases, you simply measure the total length of a linear fluorescent signal combed on a surface, and compare normal and amplified samples of DNA. Although that sounds straightforward, complications arise because DNA often breaks during its preparation. Consequently, most of the signals will not be full length, and investigators must measure each individual signal and sum their lengths to ascertain the total length combed on a surface. This approach can reveal a variety of anomalies, including amplifications involving known oncogenes, mutated genes that promote a cell’s transformation to a malignant state.

Variation in Replication

In order to transmit genetic information from one generation of cells to the next, the cells’ DNA must be replicated. To start this process, DNA’s helix unwinds at specific sites known as origins of replication. Once initiated, DNA synthesis usually proceeds bidirectionally away from an origin. DNA polymerase, however, can synthesize DNA in only one direction along the double helix. Consequently, one replicated strand in the helix is synthesized continuously, and the other strand is synthesized in short fragments of about 1,000 bases. These fragments, called Okazaki fragments, are joined by DNA ligase to complete the replication process. After duplicating the haploid genome, the chromosomes are then distributed evenly between two newly forming daughter cells, giving each new cell a complete set of parental chromosomes.

If a single origin of replication duplicated the entire two meters of DNA in a human cell, it would take a cell almost five years to finish replicating its genome. In fact, the task takes only a matter of minutes in embryos and just hours in somatic cells. Clearly, multiple origins of replication duplicate a genome, and they must be spaced with respect to each other and activated for replication in a manner that ensures the complete and accurate duplication of a cell’s full complement of genetic material before it divides. To duplicate each chromosome once and only once before the cell exits the so-called S, or synthesis, phase and enters mitosis, a cell must establish the location and genomic organization of origins of replication. In human cells, thousands of origins of replication are required for cell division.

In early experiments, investigators searched for origins of replication in plasmids, small rings of DNA that exist independent of the chromosomes in bacteria. A cloned segment of DNA was considered an origin of replication if it triggered replication when inserted into plasmids. Using this approach, molecular geneticists identified a few dozen origins of replication in higher eukaryotes. Likewise, they identified about 10 percent of the estimated 400 or so origins of replication in yeast. Such studies revealed that origins of replication in yeast always include a specific sequence of 11 bases, but this sequence itself is not sufficient to trigger replication. In other words, replication requires this sequence and other things, too. Moreover, origins of replication in yeast vary in activity In some cases, a given origin of replication is activated during each cell cycle, but other origins of replication might or might not be activated. No one knows what controls that selection.

What we do know is that replication of a genome develops at 100 to 400 discrete foci distributed throughout a cell nucleus. At the level of an individual chromosome, replication takes place in chromosomal bands of a couple of million bases, and some replicate early and others follow later. Within one such band, replication starts in broad initiation zones of 10,000 to 300,000 bases. These zones, known as replicons, consist of sequences of DNA that contain a single origin of replication. To better understand DNA replication, we must map multiple origins of replication and simultaneously analyze their activities on a chromosome- and genome-wide basis.

Organizing the Origins

In recent work on combed DNA that corresponds to the early embryo stage of development in the South African clawed frog, Xenopus laevis, we employed (in collaboration with Olivier Hyrien’s Laboratory at the Ecole normale superieure Paris) a technique for distinctly labeling early and later bands of replication. We incubated DNA in a solution that contained biotin-dUTP, biotin attached to a nucleotide called deoxyuridine triphosphate. The early replication bands incorporate this biotin-dUTP complex into newly synthesized DNA, and the locations of the modified nucleotides can be found with red fluorescing antibodies that attach to biotin. At later time points, we also added digoxigenin-dUTP, which can be labeled with green fluorescing antibodies. Consequently, later bands of replication get labeled red and green. The locations of the newly synthesized DNA correspond to origins of replication. As a result, this approach reveals the spatial and temporal pattern of DNA replication. Moreover, these experiments revealed a novel mechanism that regulates the replication program in this cell system.

In the 1970s, Alan Blumenthal, then at the University of California, San Francisco, and his colleagues applied electron microscopy and autoradiography to replicating structures in genomic DNA from Drosophila early embryos. Their findings were consistent with the observation that seemingly any sequence could serve as an origin of replication, which is also the case for embryos of Xenopus laevis. This sequence-independent activation of DNA replication suggests that origins of replication do not correspond to fixed genetic sites at this stage of development, but instead are apparently selected at random during the S phase. This creates a problem if origins of replication are fired synchronously during the first half of the S phase. A simple calculation shows that a random spatial distribution and simultaneous activation of replication origins would result in a significant fraction of the genome remaining unreplicated before entry into mitosis, because some replicons would be too large to replicate fully in the allotted time. In Drosophila embryos, origins of replication are activated synchronously, but they also appear to follow a regular distribution throughout the genome. That posed an apparent paradox: How do you get regularly spaced origins of replication when virtually any sequence can be one? It seems that loops in the chromosomal structure determine the location of origins of replication. In fact, the size of chromatin loops in embryos coincides with the average distance between replication origins.

In contrast, origins of replication in Xenopus are continuously and, therefore, asynchronously activated during the entire period of DNA synthesis. In addition, origins of replication in this organism are activated at irregular intervals every 5,000 to 15,000 bases throughout the genome. Finally, we found that the number of origins per thousand bases increased significantly as DNA synthesis advanced. Accordingly, it is the increasing frequency of origin activation-rather than some regular feature of the chromatin-that guarantees the rapid and complete duplication of a Xenopus genome. These results demonstrated the utility of molecular combing for investigating the kinetics and spatio-temporal organization of DNA replication in a higher eukaryote on a genome-wide basis.

Crystals and Chromosomes

The authors and John Bechhoefer of Simon Fraser University have developed a general model of eukaryotic genome duplication. The model exploits an analogy between crystal growth and DNA replication in higher organisms. When you place a glass of water in a freezer, the liquid freezes by continuously and randomly forming small crystals throughout its volume. The crystals grow until the entire volume solidifies and turns to ice. The so-called Kolmogorov-Johnson– Mehl-Avrami model, developed in the 1930s, describes how a liquid freezes to form a solid.

In the Kolmogorov-Johnson-MehlAvrami model, crystal growth arises from three simultaneous processes: nucleation, or the formation of solid domains in the liquid; growth of the domains; and coalescence, which comes from neighboring domains impinging on one another. Each of these processes has a counterpart in eukaryotic DNA replication. The activation of an origin of replication is analogous to the nucleation of solid domains during the growth of a crystal. Symmetric bidirectional DNA synthesis initiated at an origin corresponds to growth of the solid domain. And replication stops when adjacent replicating regions of DNA merge, which is analogous to the coalescence of crystal domains. Although based on a qualitative analogy between two very different and unrelated processes, the Kolmogorov-Johnson-Mehl-Avrami model provides quantitative information about each of the processes underlying DNA replication. This allows for a comprehensive picture of the dynamics of genome duplication, which is essential for a complete understanding of the mechanisms that control the S phase of the cell cycle.

Initially, we tested this model against combing data from the Xenopus experiments, and it produced accurate values for the average rate at which replication advances from the origins. The deduced value was 10 bases per second-in excellent agreement with previous values obtained by more-direct biochemical methods. This model also predicted approximately one origin every 5,000 bases in the embryonic genome, also in good agreement with previous estimates based on other methods. Most interestingly, this model also indicated that genome duplication includes two distinct kinetic regimes. During the first half of the S phase, origins are activated at a fairly steady rate. After about half of the genome is replicated, the frequency of activation increases dramatically. The modeling of the combing data, therefore, allows origin initiation rates to be determined with respect to both space and time, for the first time in a eukaryotic organism.

We are currently applying this model to combing data obtained from experiments on yeast and mouse embryonic stem cells, as well as the cancer cell line known as HeLa. Ultimately, the model will allow us to develop a comprehensive understanding of the dynamics of genome duplication in any cell system for which sufficient combing data are available.

More Combing to Come

The recently announced sequencing of most of the human genome marks the beginning of a new phase in genomic studies. This phase will focus on identifying and elucidating the functional elements that regulate the expression and duplication of a genome. This will require the application of a variety of technologies, including DNA arrays (currently employed in studying gene-expression profiles), molecular combing and established biochemical methods. The results will reveal many new characteristics of chromosomes and their function.

The ability of molecular combing to identify and analyze individual molecules of DNA allows unique genetic features of a region to be studied in detail. For instance, gene expression and origin activation might differ in a given region of a genome from generation to generation. These differences might affect the developmental program of an organism, and molecular combing is uniquely suited for revealing and studying such variations and their biological significance. Perhaps most important, we can deduce these aspects of genomes and gene structure because of the reliable quantitative information from combing.

Molecular combing also has a variety of clinical applications. As we have shown, it pinpoints disease-related genes with improved accuracy In addition, an enhanced knowledge of replication will also help battle some diseases. For example, a comprehensive analysis of replication profiles under chosen genetic and physiological conditions will facilitate the identification of promising drugs for use in cancer therapy Likewise, tumorspecific origins of replication in a variety of malignancies might also be identified and characterized. This in turn could facilitate the development of new compounds and methods that specifically knock out anomalous origins of replication or those factors that stimulate DNA replication and, hence, cellular proliferation. In the end, combing-a simple process from nature-could become a useful tool in many areas of research.