Extracting knowledge from data.
Using massive quantities of data to advance the sciences
Creating synergies and interactions among the analysts and the “producers” of data
The CDS (Centre for Data Science) project consists of developing methods and tools so as to be capable of analysing gigantic amounts of data and extracting useful information from them for physics, biology, medicine, chemistry, the environment and the human sciences.
This project is multidisciplinary; it requires research on analytical methodologies (statistics, processes of machine learning, extracting knowledge, viewing data), as well as on software design. Specialized knowledge in each research field from which the data come is also indispensable.
The objective goes well beyond the current initiatives surrounding “big data”, which is focused on the retrieval, transfer, storage, filing and security of the data.
The Centre for Data Science will endeavour, above all, to extract knowledge from data.
Making the data speak
Predicting the links between genes and the physical characteristics of plants and animals, understanding the interactions among proteins, piercing the mysteries of black matter using gigantic telescopes, creating unbelievably accurate images of the brain in operation, discovering new particles in accelerators or detecting cosmic rays from them, organizing billions of musical pieces posted on the web, modelling the environment, or even understanding the growth of cities and the desertification of the countryside – all this research requires the processing of staggering amounts of data and even more, making sense of it. That’s the reason for the Centre for Data Science.
The CDS advantages
The CDS is designed most of all to combine all of the multidisciplinary skills required for analysing huge amounts of data. They are rarely gathered in a single laboratory. The Centre for Data Science will combine:
- Scientists and engineers who collect data using sensors and detectors. They analyse them to discover the laws of nature;
- Data specialists, who construct algorithms and propose new data processing methods;
- Software engineers who design and implement tools;
- Systems engineers who construct calculators and make them run.
The CDS will be a contact point both for multinationals and small and medium-sized businesses. Links will also be created with higher learning and with existing data centres.
Big data means the enormous amounts of extremely varied data that we continuously produce, especially through social networks or cell phones. Gathering these enormous amounts of data has never been so easy. In science, we also produce huge amounts of data using sensors, detectors, telescopes, imaging devices, etc.
This unprecedented growth has revolutionized science and industry during the last decade. In particle physics, for example, the automatic analyses of data related to simulation are the norm today. Likewise, a new science has emerged: bioinformatics, by interfacing biology and informatics. These massive amounts of data involve almost all of the disciplines. Being able to extract useful information from these data thus becomes a crucial scientific challenge. It is also a societal challenge to learn to control them and protect them.