Jean-Marc Aury: Genomes in the age of Big data
Jean-Marc Aury, a computer engineer who accidentally fell into genomics twenty years ago, manages the R&D Bio-informatique et séquençage (R&D Bioinformatics and sequencing) team at the Laboratoire d’informatique scientifique du Genoscope (Genoscope scientific computing laboratory) (Université Paris-Saclay, CEA). Today, he assembles the complete genomes of plants and animals using bioinformatics - an area which he has played a major role in developing in recent years.
For the past two years, Jean-Marc Aury has been one of the most cited researchers in the world. “It's a great recognition of the work done and demonstrates that ours is a cutting-edge field.” He manages a team of fifteen people, made up of researchers and engineers. Any of the data produced by Genoscope's high-tech sequencers pass through the team. The data are secured, saved and then organised for use by researchers working on all Genoscope projects. “Sometimes we analyse these data ourselves, right up until they are published in collaboration with researchers,” Jean-Marc Aury says.
The task of the laboratory director and his team is to assemble the genomes of hitherto unknown plants or animals. The goal is to create the most complete reference genome possible. The team's researchers do not have a focus linked to a particular species. Depending on the projects which arrive, some of them work on assembling the genomes, while others annotate them. “We annotate the 'coding' areas of the genome, so that we can then look for genes which are of interest, such as those involved in resistance to certain diseases.” For each project, they decide on the strategy, tools and IT methods to be adopted to reconstruct the sequence. Sometimes they even go as far as to characterise them themselves. “That's the nice thing about research work, we're dealing with data that no one has ever had their hands on before.”
Reading DNA using Nanopore technology
Jean-Marc Aury and his colleagues make extensive use of “nanopore” DNA sequencing technology, which they were among the first to use a few years ago. “The Nanopore sequencer differs from all the others, partly because it is tiny (the size of a harmonica) and therefore allows sequencing to take place in the field, and partly because it produces very different types of data from other sequencers,” explains the researcher. “Unlike the ‘Illumina’ technology, which sequences DNA fragments of several tens of kilobases, the nanopore is capable of going as far as sequencing megabase-sized molecules, although the error rate remains very high (of the order of 5% to date).” Finally, whereas sequencers only synthesise the second strand of DNA to reassemble it, the nanopore is actually able to read it (the DNA strand passes through a pore anchored in a membrane).
Fish, coral, oaks and ticks
Today, the laboratory is managing around twenty projects at different stages, including sequencing the genes of corals and two fish from the Tara Pacific mission, as well as the sequencing of certain trees. “We published the genome for oak trees two years ago. Thanks to a very smooth assembly, we discovered that there were a lot hyper-abundant families, or genes duplicated in tandem, which were essentially resistance genes,” remembers Jean-Marc Aury. “This could explain the trees’ longevity. Not being able to move, they need an arsenal of resistance genes to fight against pathogens and environmental stresses, etc.” Plant and animal genomes are complex to assemble because of their size. For example, the tick genome is about the same size as the human genome. “We have just finished assembling it and are moving on to the analysis stage with a view to publishing our results.”
An engineer in the world of genomics
After studying IT at Caen University, Jean-Marc Aury joined ENSIIE in 2001. During his final year at the computer engineering school in Évry, he also studied for the Master’s in “Application des mathématiques et de l’informatique à la biologie (Application of Mathematics and Computer Science in Biology)” at Évry University and enjoyed a work placement at Genoscope. “I was fascinated by the sequencing of Tetraodon nigroviridis, a fish with a compact genome (8 times smaller than the human genome but with the same number of genes). I participated in the annotation of its genes by developing software.” The dawn of modern genomics (which needs computer scientists) had arrived, so the researcher was taken on by Genoscope straight after his work placement had finished. “At the time, bioinformatics did not yet exist. People mostly had a background in biology and not IT.” As sequencing technology evolves, the amount of data produced increases. Jean-Marc Aury is increasingly approached to develop ever more efficient processing software.
“In future, we will be able to read DNA in a single step and will no longer need to assemble genomes,” asserts the researcher. So, will he still be needed in five to ten years’ time? “Yes,” he replies. “Every day we produce more and more information which needs to be processed. Bioinformatics is just getting started. We will need big data and artificial intelligence when we start producing the vast amounts of data on the human genome. It is a very stimulating multidisciplinary sector. Collaboration will have to take place with people from other disciplines in order to take analysis and interpretation further.”