In the footsteps of our past
Palaeoproteomics, the beginning of a new era?
In 1935, the German anthropologist Gustav von Koenigswald acquired some mysterious teeth from apothecaries in southern China. The teeth, measuring about 2.5 centimetres, were used in traditional Chinese medicine and were called «dragon teeth». Von Koenigswald was puzzled by their excessive size. The teeth showed primate-like features but were larger than the molars of gorillas, the largest primates on the planet today. No wonder Von Koenigswald named the new species Gigantopithecus. In the mid-twentieth century, new fossil findings established this species as a lineage of primates that inhabited Southeast Asia from about two million years ago to 100,000 years ago, with some uncertainty as to how much they were related to our origins. Today, four jaws and nearly 1,300 teeth have been found. Although we have only been able to extrapolate the size of the giant from skull bones, experts assume that Gigantopithecus could reach three metres in height and weigh around 300 kilograms (Zhang & Harrison, 2017), the largest primate ever discovered.
The scientific community could not learn much more about Gigantopithecus due to the lack of fossils such as complete skulls and skeletons. Without molecular data, such as protein or DNA sequences, a series of fossils and other organisms can only be investigated through morphological comparisons. In fact, until a few decades ago our knowledge of the history of hominids was based on the comparison of fossil bone structures. Despite the fact that palaeontological studies have been and still are capital for understanding the past of our species, they have certain limitations when it comes to resolving phylogenetic positions or characterising demographic events in detail.
From palaeogenomics to palaeoproteomics
Today, thanks to major technological advances, we can obtain molecular data on organisms that lived tens of thousands of years ago by sequencing their DNA. This «new ability» to obtain molecular data from fossilised organisms has opened up a wide range of possibilities and has made it possible to explore questions that were previously impossible to answer. For example, thanks to the study of DNA from Neanderthals and Denisovans, our closest relatives, we know that their lineage diverged from ours some 500,000 years ago (Prüfer et al., 2014). We also know that all human populations outside Africa have about 3 % Neanderthal DNA and that some populations in Oceania show, in addition to Neanderthal DNA, up to 5 % genetic material from Denisovans (Reich et al., 2010), most likely as a result of genetic exchange between our ancestors.
So, if we can obtain DNA from fossilised organisms, how come the scientific community does not sequence everything it finds? The answer is quite intuitive. Imagine the Earth about 1.9 million years ago, the moment when the remains of a Gigantopithecus blacki ended up in Chuifeng cave in southern China. Perhaps the animal died there, or some scavenger transported part of its remains to the cave. In a short time, the meat disappeared and the skeleton began to deteriorate. Thousands of years passed, with thousands of torrential rains, droughts… even multiple glaciations. All that remains today is a mineralised tooth, where any remaining DNA has been broken down into an infinite number of molecular fragments.
Therefore, the answer to the previous question is that DNA does not last forever, it gradually deteriorates, breaks down, and is transformed into other molecules, and ends up becoming undetectable or uninterpretable to us. This deterioration is accentuated in equatorial and tropical climates, because high temperatures accelerate the process. In fact, this is one of the reasons why the vast majority of ancient DNA studies have focused on samples located quite far from the tropics. The «world record» for ancient DNA sequencing is 560,000–780,000 years old: a horse preserved in the permafrost in the far north of Canada, in Yukon Territory (Orlando et al., 2013). Outside of these exceptional preservation situations, DNA can usually be obtained from organisms that died some tens of thousands of years ago. Therefore, despite its great potential, ancient DNA is limited to opening windows to the more recent past.
«DNA does not last forever, it gradually deteriorates, breaks down, and ends up becoming undetectable or uninterpretable»
Is there any hope of obtaining Gigantopithecus molecular data from the Chuifeng cave? Its antiquity suggests that there is not much. Any trace of its DNA deteriorates over time and is completely spoiled by the hot, humid climate of southern China. Nevertheless, cells contain other types of molecules in addition to DNA. Most biology books devote a few pages to the so-called «central dogma of biology»: DNA is transcribed into RNA and then translated into proteins. Although this model is a great simplification of the actual flow of genetic information within a cell, it sheds some light with regard to our purpose: would it be possible to sequence the RNA or proteins of a 1.9-million-year-old tooth? RNA is very similar to DNA, a sequence of nucleotides where thymine (T) is replaced by uracil (U), so its stability and resistance to the passage of time is comparable to that of DNA. But what about proteins?
«Proteins seem to fulfill all the requirements to allow us to study our most distant past, which is inaccessible through DNA.»
Proteins are more stable than DNA. Both molecules are chains of small units (nucleotides and amino acids, respectively), but the bonds between protein units, peptide bonds, are more resistant. In addition, proteins fold into three-dimensional structures that can protect them from deterioration (they are, to a certain extent, folded over themselves, out of reach of external influences). Last but not least, proteins are more abundant; excluding water, approximately 3 % of the mass of a cell is DNA, very little compared to 50 % of the protein mass. Proteins therefore seem to meet all the conditions required to study the more distant past, which is inaccessible using DNA. Although this seems like a revolutionary idea, in the 1950s some researchers had already documented the detection of amino acids in fossils. Even so, the technology for sequencing ancient proteins did not exist until the 2000s, when the scientific community adopted mass spectrometry, a technique typically used in physics disciplines. A mass spectrometer detects the presence of different peptides (short fragments of a protein) in a sample, and does so by analysing their mass (they «weigh» them). The information from the peptides can then be integrated into longer protein sequences.
What a tooth can tell us
At the end of 2019, an international team succeeded in obtaining proteins from the enamel of Chuifeng’s Gigantopithecus by means of mass spectrometry (Welker et al., 2019). Taking into account the «thermal age» of the sample (1.9 million years adjusted to the annual temperature of the cave), these are the oldest molecular data of mammals ever obtained. Gigantopithecus enamel is rich in six different proteins. By comparing the protein sequences obtained with those of other present-day great apes (chimpanzees, bonobos, orangutans, and humans), it was possible to establish that Gigantopithecus was a close relative of the orangutans. On the other hand, using the molecular clock (the rate at which changes appear in the sequence of a protein), it is believed that orangutans and Gigantopithecus shared a common ancestor some ten or twelve million years ago. In addition, the detection of an uncommon protein in the enamel of other apes allowed experts to hypothesise about the reasons behind the distinctive dental morphology of Gigantopithecus. If Gustav von Koenigswald were alive, he would surely be amazed at how technology has transformed biology in less than a hundred years.
Apart from extending our knowledge about Gigantopithecus, the study is important for reasons not directly related to the giant primate. It shows that it is possible to obtain molecular data on organisms that are millions of years old in subtropical climates. It opens up an enormous range of new opportunities to understand our past. Another good example of these applications can be found in 2019. The Denisovans are a mysterious group of hominids. Until recently they had only been identified through five fossils found in the Denisova cave (the origin of their name) in the Altai mountains in Russia. In 2010 the complete sequence of their genome (the entire DNA sequence of an organism) was obtained from the phalanx of a Denisovan girl (Reich et al., 2010). Despite all this, a Denisovan fossil had never been found outside the Altai Mountains. In May 2019, a team of researchers identified a 160,000-year-old jawbone found on the Tibetan plateau as Denisovan using paleoproteomics (Chen et al., 2019). The age and conservation status of the fossil made ancient DNA extraction impossible, and without palaeoproteomics we would probably never have learned that the jawbone belonged to a Denisovan.
«Palaeoproteomics has the potential to fill the gaps that still exist in the phylogenetic tree of humans and their ancestors»
On the other hand, the really interesting cases may be those for which we do not have any molecular data. Who were the mysterious «hobbits» of the island of Flores (Homo floresiensis)? Was Homo heidelbergensis our ancestor, or rather the ancestor of the Neanderthals? What about Homo erectus? What can we learn from another recently discovered lineage, Homo naledi? The list could go on and on. The application of palaeoproteomics has the potential to fill the gaps that still exist in the phylogenetic tree of humans and their ancestors beyond 500,000 years ago.
Despite this last ode to progress, the new field of palaeoproteomics has certain limitations. Although proteins contain encoded genetic information (as described by the «central dogma of biology»), they are less informative than DNA. There are two main reasons for this. Firstly, because some of the genetic information is lost in the translation of RNA into protein. The genetic code is redundant, i.e., there are different RNA-encoding triplets for the same amino acid. For example, the amino acid serine is encoded by two RNA nucleotide triplets: AGU and AGC. So, if we find a serine in the sequence of a protein, we have no tool to discern what the original DNA triplet was. Secondly, proteins are fundamental to the functioning of a cell, therefore the DNA regions that code for proteins have a great selective pressure, that is, sequence changes are very restricted, because any modification could imply a loss of function. For practical purposes, high selective pressure results in low sequence diversity. For example, collagen is a protein that has virtually the same sequence in a human as in a macaque. This observation is explained by the great importance of collagen: millions of years of evolution have generated an optimal sequence of the protein, and any mutations will be eliminated by the effect of natural selection. Therefore, obtaining molecular data on proteins can result in extremely similar sequences between organisms, making it difficult to reconstruct phylogenies and establish evolutionary relationships. An example: imagine that you get the collagen sequence from a sample of Homo floresiensis. After spending a lot of time and resources, you realise that its protein and ours (Homo sapiens) are identical and, therefore, you cannot learn almost anything about the Homo floresiensis. In short, the proteins can become so preserved in the evolution that obtaining the sequence can turn out to be quite useless.
Proteomics is also confronted with other challenges, this time related to people and the modus operandi of the scientific community. You must be careful not to fall into the same trap as ancient DNA twenty years ago. During the 1990s and early 2000s, some publications claimed to have obtained DNA from dinosaurs or insects trapped in amber (in the purest Jurassic Park fashion), which in retrospect are hard to believe. These results were later shown to be the result of contamination with modern DNA or other methodological errors. Palaeoproteomics is a fascinating field that will receive a lot of attention in the coming years. The scientific community will therefore have to establish a list of good practices, reproducibility, and transparency if they want to avoid making the same mistakes of the past.
This is undoubtedly an encouraging stage for evolutionary biology. Palaeoproteomics has the potential to explore time intervals that were completely inaccessible until now. We must remember that this technique is not restricted to the study of primates or humans, and can influence research on other animals, plants, fungi, etc. In 2019 (an extremely fruitful year for palaeoproteomics and possibly an omen for the near future), the phylogenetic position of a rhinoceros that lived 1.7 million years ago was established (Cappellini et al., 2019). Only a few years earlier, another research project obtained proteins from the shell of a 3.8-million-year-old ostrich egg at Laetoli (Demarchi et al., 2016), the same site of the famous Australopithecus afarensis footprints. So we are most likely on the verge of a wave of discoveries driven by new generations of scientists who will perhaps change our understanding of evolutionary biology forever.
Cappellini, E., Welker, F., Pandolfi, L., Ramos-Madrigal, J., Samodova, D., Rüther, P. L., ... Willerslev, E. (2019). Early Pleistocene enamel proteome from Dmanisi resolves Stephanorhinus phylogeny. Nature, 574, 103–107. doi: 10.1038/s41586-019-1555-y
Chen, F., Welker, F., Shen, C.-C. , Bailey, S. E., Bergmann, I., Davis, S., ... Hublin, J.-J. (2019). A late Middle Pleistocene Denisovan mandible from the Tibetan Plateau. Nature, 569, 409–412. doi: 10.1038/s41586-019-1139-x
Demarchi, B., Hall, S., Roncal-Herrero, T., Freeman, C. L., Woolley, J., Crisp, M. K., ... Collins, M. J. (2016). Protein sequences bound to mineral surfaces persist into deep time. eLife, 5, e17092. doi: 10.7554/eLife.17092
Orlando, L., Ginolhac, A., Zhang, G., Froese, D., Albrechtsen, A., Stiller, M., ... Willerslev, E. (2013). Recalibrating Equus evolution using the genome sequence of an early Middle Pleistocene horse. Nature, 499, 74–78. doi: 10.1038/nature12323
Prüfer, K., Racimo, F., Patterson, N., Jay, F., Sankararaman, S., Sawyer, S., ... Pääbo, S. (2014). The complete genome sequence of a Neanderthal from the Altai Mountains. Nature, 505, 43–49. doi: 10.1038/nature12886
Reich, D., Green, R. E., Kircher, M., Krause, J., Patterson, N., Durand, E. Y., ... Pääbo, S. (2010). Genetic history of an archaic hominin group from Denisova Cave in Siberia. Nature, 468, 1053–1060. doi: 10.1038/nature09710
Welker, F., Ramos-Madrigal, J., Kuhlwilm, M., Liao, W., Gutenbrunner, P., De Manuel, M., ... Cappellini, E. (2019). Enamel proteome shows that Gigantopithecus was an early diverging pongine. Nature, 576, 262–265. doi: 10.1038/s41586-019-1728-8
Zhang, Y., & Harrison, T. (2017). Gigantopithecus blacki: A giant ape from the Pleistocene of Asia revisited. American Journal of Physical Anthropology, 162(S63), 153–177. doi: 10.1002/ajpa.23150