AVE-TAL and ProfessorBob.: towards a more elaborate virtual teaching assistant
As part of the AVE-TAL project, scientists from the Interdisciplinary Laboratory of Digital Sciences (LISN - Univ. Paris-Saclay, CNRS, CentraleSupélec, Inria) are collaborating with the start-up ProfessorBob.ai, which is shaking up the education sector with its virtual teaching assistant. The start-up uses the LISN team’s expertise in language processing and its latest advances in the field to further develop its solution.
In a context in which 20% of students are struggling at school - i.e. one million in France* - the virtual tutor ProfessorBob.ai has been offering to prevent dropout since July 2020. It is dedicated to students and teachers in high schools, colleges, universities and professional training, so that "they learn twice as fast and retain three times better", according to its founder François-Xavier Hussher. This is because a teacher cannot personalise their course when they have 35 students in the class. To achieve this level of personalised learning, the solution uses artificial intelligence and, more specifically, deep learning - a set of machine learning methods based on algorithmic mathematical approaches, used to model data - which adapts to profiles according to their level.
To provide its clients with an elaborate virtual tool, three years ago ProfessorBob.ai turned to the scientists at the Interdisciplinary Laboratory of Digital Sciences (LISN - Univ. Paris-Saclay, CNRS, CentraleSupélec, Inria) and, more specifically, those in the Information, written and signed language (ILES) team. The start-up wanted to leverage the latest research knowledge in language processing via deep learning. It approached Anne Vilnat, a lecturer at LISN and member of the ILES team, to develop a system for automating questions and answers from course content. "In the ILES team, which specialises in written language processing, we deal with the comprehension, extraction and generation of information, both for specialised fields such as medical computing and for education," explains Anne Vilnat. Together with her colleague Gabriel Illouz, they are developing tools to automatically generate multiple-choice questions to assist teachers. These skills perfectly complement those of the start-up: "ProfessorBob.ai is very good at searching for information in texts via deep learning, but not at processing the written word," explains Gabriel Illouz.
AVE-TAL: automating questions and answers
Their project involves designing a system that can ask questions and evaluate the quality of the answers provided, to facilitate student learning. "Often, the average student thinks they have understood and learned the course by rereading it once, but in fact this is not the case. The tool must give the student the ability to practise independently, which in turn frees up the teacher's correction time," explains Anne Vilnat. In 2020, LISN scientists submitted their AVE-TAL (Virtual Teaching Assistant - Automatic Language Processing) research project to SATT (Technology Transfer Acceleration Company) Paris-Saclay. SATT Paris-Saclay signed the technology transfer contract and financed the recruitment of two engineers and a post-doc, to make the link between the academic skills and entrepreneurial ambitions at stake.
Collection of data and evaluation of the system
Since the data that the start-up possesses is not sufficient to carry out the project, the two LISN academic staff contacted their colleague Patrick Paroubek, who specialises in data collection and the evaluation of automatic language processing within the ILES team. In fact, the task to achieve is complex, as only certain aspects of natural language can be formalised and it is not always possible to prove that the extraction or text generation programme is efficient. Moreover, there is no standard for judging the relevance of the form, as "deciding that a summary is good also depends on the assessment of the person reading it," explains Patrick Paroubek. It was then a matter of collecting a corpus of texts, applying syntactic or semantic annotations by hand, and comparing them with those produced by the machine. "This statistical operation indicates the system's ability to solve the task at hand," summarises Patrick Paroubek.
Contextual semantic processing
Once collected, the data must then be processed. Great advances in neural methods since 2018 are making contextual semantic analysis possible. Combined with the increase in computing power, deep learning is also undergoing a revolution, as applying learning algorithms to very large data sets is now possible. Patrick Paroubek explains, "We have crossed many boundaries in terms of semantics. While the machine does not yet pass the Turing test, it now achieves performance comparable to that of a human being on certain comprehension tasks." In this way, artificial intelligence generates elaborate exercises and questions, which require the student to aggregate scattered information, present in different documents, and compare it. For example, it is possible to obtain a complete answer to the question "What caused the Battle of Marignano?".
The scope of AVE-TAL
To start, AVE-TAL is mainly targeting history, civic education and geography, at the middle and high school level. To guarantee the reliability of the information, the data comes from teachers' courses and French textbooks that are free of copyright and validated by the teachers. Thanks to a dedicated tool, annotators - teachers and students - who have been specifically recruited for the task, produce different types of questions related to course pages. They select the passage where the answer can be found and formulate the question. "Our goal is to create 10,000 questions. We will then validate the method on this set, to make it evolve, extend it to larger bodies of data, and transfer it to other disciplines and school levels", comments Anne Vilnat. This transfer would involve, for example, economics, finance, law or mathematics. At the same time, the developed annotation interface is made available to researchers wanting to develop their own experiments.
At the end of this first stage in June 2023, the PDF document exploitation patent filed by the start-up could be extended to automatic language processing functionalities. The start-up also intends to use the LISN team again for new research projects. One project involves developing a dialogue system with students, so that they can go deeper into topics of their choice; or developing a dialogic agent offering answers in the form of learning videos. "As the temporality of research and of a company are not the same, we also provide ProfessorBob.ai with consulting services, to frame the different phases of the projects and evaluate the resources and corpus necessary to conduct them," concludes Anne Vilnat.
*according to François-Xavier Hussher, founder of ProfessorBob.ai