Imagine a written script for a play, or film, or television programme. It is perfectly possible for someone to read a script just as they would a book. But the script becomes so much more powerful when it is used to produce something. It becomes more than just a string of words on a page when it is spoken aloud, or better yet, acted.
DNA is rather similar. It is the most extraordinary script. Using a tiny alphabet of just four letters it carries the code for organisms from bacteria to elephants, and from brewer’s yeast to blue whales. But DNA in a test tube is pretty boring. It does nothing. DNA becomes far more exciting when a cell or an organism uses it to stage a production. The DNA is used as the code for creating proteins and these proteins are vital for breathing, feeding, getting rid of waste, reproducing and all the other activities that characterise living organisms.
Proteins are so important that in the twentieth century scientists used them to define what they meant by a gene. A gene was described as a sequence of DNA that codes for a protein.
Let’s think about the most famous scriptwriter in history, William Shakespeare. It can take a while for us to tune in to Shakespeare’s writings because of the way the English language has changed in the centuries since his death. But even so, we are always confident that the bard only wrote the words he needed his actors to speak.
Shakespeare did not, for example, write the following:
vjeqriugfrhbvruewhqoerahcxnqowhvgbutyunyhewqicxhjafvurytnpemxoqp[etjhnuvrwwwebcxewmoipzowqmroseuiednrcvtycuxmqpzjmoimxdcnibyrwvytebanyhcuxqimokzqoxkmdcifwrvjhentbubygdecftywer
ftxunihzxqwemiuqwjiqpodqeotherpowhdymrxnamehnfeicvbrgytrchguthhhhhhhgcwouldupaizmjdpqsmellmjzufernnvgbyunasec
huxhrtgcnionytuiongdjsioniodefnionihyhoniosdreniokikiniourvjcxoiqweopapqsweetwxmocviknoitrbiobeierrrrrrruorytnihgfiwos
wakxdcjdrfuhrqplwjkdhvmogmrfbvhncdjiwemxsklowe
Instead, he just wrote the words which are underlined:
vjeqriugfrhbvruewhqoerahcxnqowhvgbutyunyhewqicxhjafvurytnpemxoqp[etjhnuvrwwwebcxewmoipzowqmroseuiednrcvtycuxmqpzjmoimxdcnibyrwvytebanyhcuxqimokzqoxkmdcifwrvjhentbubygdecftywer
ftxunihzxqwemiuqwjiqpodqeotherpowhdymrxnamehnfeicvbrgytrchguthhhhhhhgcwouldupaizmjdpqsmellmjzufernnvgbyunasec
huxhrtgcnionytuiongdjsioniodefnionihyhoniosdreniokikiniourvjcxoiqweopapqsweetwxmocviknoitrbiobeierrrrrrruorytnihgfiwos
wakxdcjdrfuhrqplwjkdhvmogmrfbvhncdjiwemxsklowe
That is, ‘A rose by any other name would smell as sweet’.
But if we look at our DNA script it is not sensible and compact, like Shakespeare’s line. Instead, each protein-coding region is like a single word adrift in a sea of gibberish.
For years, scientists had no explanation for why so much of our DNA doesn’t code for proteins. These non-coding parts were dismissed with the term ‘junk DNA’. But gradually this position has begun to look less tenable, for a whole host of reasons.
Perhaps the most fundamental reason for the shift in emphasis is the sheer volume of junk DNA that our cells contain. One of the biggest shocks when the human genome sequence was completed in 2001 was the discovery that over 98 per cent of the DNA in a human cell is junk. It doesn’t code for any proteins. The Shakespeare analogy used above is in fact a simplification. In genome terms, the ratio of gibberish to text is about four times as high as shown. There are over 50 letters of junk for every one letter of sense.
There are other ways of envisaging this. Let’s imagine we visit a car factory, perhaps for something high-end like a Ferrari. We would be pretty surprised if for every two people who were building a shiny red sports car, there were another 98 who were sitting around doing nothing. This would be ridiculous, so why would it be reasonable in our genomes? While it’s a very fair point that it’s the imperfections in organisms that are often the strongest evidence for descent from common ancestors — we humans really don’t need an appendix — this seems like taking imperfection rather too far.
A much more likely scenario in our car factory would be that for every two people assembling a car, there are 98 others doing all the things that keep a business moving. Raising finance, keeping accounts, publicising the product, processing the pensions, cleaning the toilets, selling the cars etc. This is probably a much better model for the role of junk in our genome. We can think of proteins as the final end points required for life, but they will never be properly produced and coordinated without the junk. Two people can build a car, but they can’t maintain a company selling it, and certainly can’t turn it into a powerful and financially successful brand. Similarly, there’s no point having 98 people mopping the floors and staffing the showrooms if there’s nothing to sell. The whole organisation only works when all the components are in place. And so it is with our genomes.
The other shock from the sequencing of the human genome was the realisation that the extraordinary complexities of human anatomy, physiology, intelligence and behaviour cannot be explained by referring to the classical model of genes. In terms of numbers of genes that code for proteins, humans contain pretty much the same quantity (around 20,000) as simple microscopic worms. Even more remarkably, most of the genes in the worms have directly equivalent genes in humans.
As researchers deepened their analyses of what differentiates humans from other organisms at the DNA level, it became apparent that genes could not provide the explanation. In fact, only one genetic factor generally scaled with complexity. The only genomic features that increased in number as animals became more complicated were the regions of junk DNA. The more sophisticated an organism, the higher the percentage of junk DNA it contains. Only now are scientists really exploring the controversial idea that junk DNA may hold the key to evolutionary complexity.
In some ways, the question raised by these data is pretty obvious. If junk DNA is so important, what is it actually doing? What is its role in a cell, if it isn’t coding for proteins? It’s becoming apparent that junk DNA actually has a multiplicity of different functions, perhaps unsurprisingly given how much of it there is.
Some of it forms specific structures in the chromosomes, the enormous molecules into which our DNA is packaged. This junk prevents our DNA from unravelling and becoming damaged. As we age, these regions decrease in size, finally declining below a critical minimum. After that, our genetic material becomes susceptible to potentially catastrophic rearrangements that can lead to cell death or cancers. Other structural regions of junk DNA act as anchor points when chromosomes are shared equally between different daughter cells during cell division. (The term ‘daughter cell’ means any cell created by division of a parental cell. It doesn’t imply that the cell is female.) Yet others act as insulation regions, restricting gene expression to specific regions of chromosomes.
But a great deal of our junk DNA is not simply structural. It doesn’t code for proteins, but it does code for a different type of molecule, called RNA. A large class of this junk DNA forms factories in the cell, helping to produce proteins. Other types of RNA molecules transport the raw material for protein production to the factory sites.
Other regions of junk DNA are genetic interlopers, derived from the genomes of viruses and other microorganisms that have integrated into human chromosomes, like genetic sleeper agents. These remnants of long-dead organisms carry potential dangers to the cell, the individual and sometimes even to wider populations. Mammalian cells have developed multiple mechanisms to keep these viral elements silent, but these systems can break down. When they do, the effects can range from relatively benign — changing the coat colour of a particular strain of mice — to much more dramatic, such as an increased risk of cancer.
A major role of junk DNA, only recognised in the main in the last few years, is to regulate gene expression. Sometimes this can have a huge and noticeable effect in an individual. One particular piece of junk DNA is absolutely vital for ensuring healthy gene expression patterns in female animals. Its effects are seen in a whole range of situations. A mundane example is the control of the colour patterns of tortoiseshell cats. At its most extreme, the same mechanism also explains why female identical twins may present with different symptoms of a genetically inherited disease. In some cases, this can be so extreme that one twin is severely affected with a life-threatening disorder while the other is completely healthy.
Thousands and thousands of regions of junk DNA are suspected to regulate networks of gene expression. They act like the stage directions for the genetic script, but directions of a complexity we could never envisage in the theatre. Forget about ‘Exit, pursued by a bear’. These would be more along the lines of ‘If performing Hamlet in Vancouver and The Tempest in Perth, then put the stress on the fourth syllable of this line of Macbeth. Unless there’s an amateur production of Richard III in Mombasa and it’s raining in Quito.’
Researchers are only just beginning to unravel the subtleties and interconnections in the vast networks of junk DNA. The field is controversial. At one extreme we have scientists claiming experimental proof is lacking to support sometimes sweeping claims. At the other are those who feel there is a whole generation of scientists (if not more) trapped in an outdated model and unable to see or understand the new world order.
Part of the problem is that the systems we can use to probe the functions of junk DNA are still relatively underdeveloped. This can sometimes make it hard for researchers to use experimental approaches to test their hypotheses. We have only been working on this for a relatively short space of time. But sometimes we need to remember to step back from the lab bench and the machines that go ping. Experiments surround us every day, because nature and evolution have had billions of years to try out all sorts of changes. Even the brief geological moment that represents the emergence and spread of our own species has been sufficient time to create a greater range of experiments than those of us who wear lab coats could ever dream of testing. Consequently, throughout much of this book we will explore the darkness by using the torch of human genetics.
There are many ways to begin shining a light on the dark matter of our genome, so let’s start with an odd but unassailable fact to anchor us. Some genetic diseases are caused by mutations in junk DNA, and there is probably no better starting point for our journey into the hidden genomic universe than this.