PressCNRS international magazine

Table of contents

Cover story 1/3

Genomics : still terra incognita

Genomics: still terra incognita

© P. Stroppa/CEA

A robot dispensing drops of cell culture makes it possible to take a sample of cells from a multi-well plate and to deposit
them on a 4 cm2 slide by the 50 nanoliter drop.

Genetics is perhaps the most rapidly advancing scientific field when it comes to discoveries. The molecule that carries and transmits genetic information, deoxy-ribonucleic acid (DNA), was first identified in 1944. Nine years later, in 1953, another historic discovery was made by Francis Crick and James Watson, when they found DNA to have a double helical structure. The announcement raised quite a clamor: “This discovery, which is one of the most important of the twentieth century, revealed the mechanisms underlying the stability and variability of inherited traits, how they are transmitted, and to a certain extent, how they are genetically expressed. It revolutionized the life sciences as a whole, including the fields of health, medicine and agriculture,” explains Alain Bucheton, assistant director of the Human Genetics Institute in Montpellier.1

Other major discoveries soon followed in the 1960s and 1970s, including the discovery of reverse transcriptase and restriction enzymes (see glossary). These discoveries led to the first generation of genetic vectors for cloning foreign genes and to DNA sequencing, two techniques that make up the basics of modern genetic engineering. The advent of bioinformatics has since then accelerated the pace of progress, resulting in an explosion of information. “The core of what we teach in genetics today was unknown 30 years ago. We have covered a lot of ground…” says Michel Morange, head of the Molecular Biology of Stress department at ENS in Paris.2

Among the many achievements of molecular biology, the Human Genome Project—the complete sequencing of the human genome (see glossary)—has been one of the most impressive. The titanic undertaking of deciphering letter by letter the 3 billion nucleotide bases of DNA bundled up inside the nucleus of every single cell “is by no means the end of the story, but rather the beginning,” claims Lawrence Aggerbeck, director of the Molecular Genetics Center in Gif-sur-Yvette.3 “For the first time, we are now able to study in their entirety the parts of 'the machine', the total set of components making up the motor, even if, as it often turns out, the nature of those parts remains enigmatic and their variants unknown,” adds Bernard Dujon from the Genome Structure and Dynamics department of the Institut Pasteur.4

Jean Weissenbach, director of the Genoscope—National Center for Sequencing—shares his enthusiasm but shows his humility in the face of the size of this project.5 According to Weissenbach, finishing the complete sequence of the human genome, as well as those of dozens of other model organisms, such as bacteria, yeast, the nematode C. elegans, drosophila, and mice, “marks an undeniable turning point in the history of biology. It brings to a close a period dominated by the identification of individual biological processes, and inaugurates a new era focused on understanding the biological organization of the cell as a whole and even of the entire organism.” It goes without saying that making the genome “speak” is a formidable task. Indeed, the sequencing of the human genome revealed a first big surprise: the number of genes coding for proteins was substantially smaller than expected. At the beginning of the 1990s, the number of genes in the human genome was estimated to be about 100,000. By the end of the decade, that number was revised downward to around 30,000 genes. It now stands at somewhere between 20,000 and 25,000 genes. In other words, the number of genes needed to make a human being is about the same as the number needed to make a mouse and only slightly higher than the number needed to make a fruit fly. However, that humbling observation need not trouble us, since genes are to the genetic program what words are to a text: in different combinations, they can give birth to both tabloids and literary masterpieces.

The reassuring view of DNA as simply a collection of genes working together to synthesize proteins is now being challenged. “DNA sequences which encode proteins represent only 1.2% of a cell's genetic material,” says Weissenbach. “What is the remaining 98% of the genome doing? In fact, the genome includes a string of pseudogenes, the genetic debris of inactivated genes that form a sort of junkyard,” jokes Dujon. “These genes, that once served a purpose and no longer do (except in certain rare instances), are rusting away where they sit without causing any harm…” They are what computer programmers would call “dead code.” “Most non-functional pseudogenes come from a copy of DNA derived from the messenger RNA (see glossary) of a functioning gene. The DNA copy is then inserted into the genome thanks to the enzyme reverse transcriptase, which is expressed from transposons,” explains Bucheton (see glossary). This category of DNA, ironically termed “junk DNA”, includes transposable elements, which make up half of the genome. These enigmatic molecular machines behave like genetic squatters and exist simply to duplicate. And unlike more functional elements like genes, they jump at will from one place in the genome to another. In a sense, they can be seen as homeless genes. One example is the transposon L1, a DNA sequence that is repeated up to 500,000 times in the human genome. Almost all of its copies, which have accumulated over time and represent 20% of the genome, produce no proteins at all, just like pseudogenes. Only a tiny fraction (80 to 100) remains active. “Those are the ones that can cause genetic diseases,” says Bucheton. “But we would be wrong to characterize transposable elements only in terms of the harm they do. In fact, they are responsible for generating a large proportion of the genetic diversity in the genome, and are consequently a major driving force in evolution. They are also at the source of genetic innovation since they create new genes and new functions that are useful to the cell.” The so-called “dark matter” of the genome also produces RNA that is not translated into proteins, but is of the highest interest to biologists: non-coding RNA. “Only now are we understanding the role some of these RNAs play in regulating gene expression, RNAs such as micro-RNA and interference RNA (RNAi),” says Weissenbach. “They may exist in all eukaryotes, since they have been found in plants, animals, and fungi. Nonetheless, there are a number of other types of RNA whose function—if they have one—is yet to be elucidated. On the other hand, they may have no function at all and may represent nothing more than the cell's background noise.” That amounts to another field of investigation whose extent and importance remain, as yet, unknown.

We are a long way from understanding all the mysterious workings of DNA. As for those genes that do code for proteins, two excellent tools are available to clarify their roles in the cell: DNA chips and proteomics. The origins of these techniques are found in genome sequencing projects. Claude Jacq, of the Molecular Genetics Laboratory at ENS,6 is enthusiastic, comparing them to the invention of the microscope in the eighteenth century: “Ten years ago, in characterizing relationships between the thousands of different genes, biologists were limited to studying only a handful at a time. This technical barrier was overcome in 1995 when two researchers at Stanford University arrayed the entire 6000 genes that make up baker's yeast, separated and in order, on a simple microscope slide, to act as a trap for the corresponding messenger RNA. The first pan-genomic DNA chip was born.” Since then, the technique has become widely used. It takes advantage of a feature unique to RNA and DNA, namely the hybridization between the complementary base pairs that make up the respective nucleotide chains. “The two separated strands of a molecule of DNA recognize each other and interact in an orderly fashion, hybridizing to reconstitute the original double helix,” explains Jacq. The same goes for messenger RNA, which likewise hybridizes to the complementary strand of DNA that expressed it. Now if every cell in an organism has the same genetic inheritance, the transcriptome, or complete set of expressed genes, will vary considerably from one type of cell to another (skin, bone, brain…) and will also vary according to the state it is in (normal, cancerous, infected…). By arraying micro-droplets of DNA on a surface less than 1 square cm in size (hence the name “chip”), where each droplet contains a fragment of each gene in the genome, and then hybridizing the resulting DNA chip with a sample of RNA from a given cell type, researchers can characterize which genes are expressed in that cell, be it, for example, a normal cell or its cancerous counterpart. The comparison between the two in turn allows for the identification of the mutated genes in the cancer cell, something which is of an obvious clinical importance. “When examined under a special microscope, the chip will light up in spots where the RNA of the cancerous cell binds to it,” says Jacq. “In one single step, we can monitor the activity of thousands of genes and observe how some of them react to the presence of a drug, for example.”

Proteomics is another area of research attracting a lot of attention. The proteome is the complete set of proteins present in a cell. The 100,000 proteins in the human body are at the heart of practically every known cellular function, be it a protein like insulin, which prevents diabetes, an immunoglobulin which protects against infection, or keratin, which constitutes the epidermis, finger nails and hair. “Several different proteins may be synthesized from a single gene, and can then be further modified according to their functions,” states François Amalric, director of the Pharmacology and Structural Biology Institute in Toulouse.7 Work on the proteome is progressing slowly but surely. “The main problem we face today is analyzing all the proteins in a given cell in order to identify them one by one by mass spectrometry. The more we know about these fundamental components of the cell, the better we will understand how their functions combine to produce life.”

The quickening pace of discovery in molecular genetics is shaking the edifice of biology to its foundations. And the profession still has many questions to explore: How is the genome organized? What are the signals, immersed in the hodgepodge of non-coding DNA that intervene in its regulation? What are the functions of thousands of proteins? How do genes interact with each other and with other components in living organisms? What is the importance of the epigenetic mechanisms of regulation (see glossary) which were once believed to be minor, and what are the underlying molecular mechanisms? Why is it that the more we understand about genes, the less they seem capable of acting alone and of explaining life? “Even if we have accumulated an enormous amount of data, we are swimming in an ocean of unknowns,” says Bucheton. “The genome, and a fortiori the cell, is still terra incognita. We will probably be at it for a while just doing inventory on its components before we are in any position to understand the logic behind its workings…” Weissenbach adds. That prediction notwithstanding, the methods and concepts that will allow us to situate the function of DNA in the larger contexts of the cell and the organism will be at the heart of the post-genomic era in biology. “We are at the beginning of a new adventure, and the way to proceed is not as clear as it was in the past,” says Pierre Sonigo, head of the Virus Genetics Laboratory at the Cochin Institute in Paris.8 “Scientific organizations around the world are pushing their labs towards a biology of systems.” This is based on the central idea that the behavior of a system is determined by the ever-changing relations between its components rather than by the components themselves. In other words, “no single component can explain the functioning of the whole, as had been previously assumed under the classical paradigm of genetics.” The global behavior of a system emerges from the synchronous behaviors of a large number of local factors. Each obeys its own logic, without being a slave to the collective goal. “Dynamic networks, complex phenomena, emergent properties, self-organization: these are buzz words that reflect the influence of the Internet and ecosystems in this new era of biology, as opposed to previous influences such as clocks, thermostats, and the first robots.” This explains why the current approach relies heavily on physics, mathematics, and computers, and hints at many more wonderful surprises in the future.

Philippe Testard-Vaillant


Decrypthon: accelerating genomic and post-genomic research

On March 15, 2005, the French Association against Myopathies (AFM)1, CNRS, and IBM launched the Decrypthon program. Grid computing, or sharing CPU resources across a network to create a supercomputer, is the platform and principal innovation of this program, which relies on two specific grids: A 'university grid' composed of the supercalculators of the universities of Bordeaux-I, Lille-I, and Paris-VI, and an internet users' grid, activated on need. The grids will generate the computational power needed to handle complex research projects in genomics and proteomics, a dramatic boost towards understanding genetic diseases and muscular dystrophies.

Stéphanie Bia

1. Association Française contre les Myopathies

Notes :

1. Institut de génétique humaine. CNRS-only unit.
2. Laboratoire de la régulation de l'expression génétique de l'École normale supérieure. Joint lab: CNRS / ENS.
3. Centre de génétique moléculaire. CNRS-only unit. 4. Laboratoire de structure et dynamique des génomes, Institut Pasteur.
5. Joint lab: CNRS / Genoscope / Institut d'Évry
6. Laboratoire de génétique moléculaire. Joint lab: CNRS / ENS.
7. Institut de pharmacologie et de biologie structurale. Joint lab: CNRS / Université de Toulouse-III.
8. Laboratoire de la génétique des virus. Joint lab: CNRS / Inserm / Université Paris-V.

Contacts :

Lawrence Aggerbeck

François Amalric

Alain Bucheton

Bernard Dujon,

Claude Jacq,

Michel Morange

Pierre Sonigo

Jean Weissenbach


Back to homepageContactcredits