Here is a list of (recent or ongoing) projects involving the team:

Current projects

ORTOLANG (2011-ongoing)

Our role in the ORTOLANG project is to process large corpora in French with syntactic and semantic tools. These corpora will be maintained in a stable state and diffused with a guarantee of at least 10 years.

Contact: Claire GARDENT

METAL (2016-2020)

The METAL project aims at designing, developing and evaluating a set of monitoring tools for students or teachers (Learning Analytics), and technology for personalized learning of written languages (French grammar) and oral languages (pronunciation of living languages). It contributes to improving the quality of learning and the development of language proficiency by students.

Contact: Claire GARDENT

ModelWriter (2014-2017)

The ModelWriter project is an ITEA3 funded project which aims to develop an integrated authoring environment combining Semantic Word Processing (= the "Writer" part) with Knowledge Capture Tool (= the "Model" part) solutions. In this project, we work on semantic analysis (from text to model) and natural language generation (from model to text).

Contact: Claire GARDENT

WebNLG (2014-2017)

Funded by the French ANR, the WebNLG project aims to develop novel technologies for generating text from knowledge bases and linked data. The WebNLG partners are Synalp, LORIA/CNRS, Nancy (France), SRI, Stanford (USA) and KRDB Bolzano.

Contact: Claire GARDENT

Past projects

Allegro (2010-2014)

The SYNALP (ex-TALARIS) project-team concentrated on the integration of TAL (Automatic Language Treatment) techniques and virtual worlds. The objective was to design immersive software prototypes, 3D games "that work like treasure hunts". The user can find himself in the role of a restaurant client who orders by using a certain number of words. The system detects his errors and assists him in correcting them before he is able to continue to the following step.

Contact: Claire GARDENT

ContNomina (2013-2016)

Our role in the ContNomina project is to improve the performances of named entity recognition through the use of unsupervised structural learning.

Contact: Christophe CERISARA

Emospeech (2010-2014)

Dialogue systems are complex distributed and asynchronous architectures that gather specialized components. Broadly, these components solve the tasks of modal-based recognition and synthesis, understanding, dialogue management, generation, fission and fusion; and they can be either symbolic or stochastic oriented. The lack of domain-specific and linguistic resources is the major difficulty when incorporating dialogue in different domains and languages.

Within the Emospeech project, we developed the Emospeech Dialogue Toolkit, for supporting human-machine dialogues and data collection. For supporting data collection we allow a human, the Wizard of Oz to plug-in/out into the dialogue architecture. The Emospeech Dialogue Toolkit is a multi agent architecture for developing man/machine dialog systems in the context of a video game. It includes the following agents:

  • MIDIKI Dialogue Manager: We extended and improved the open source MIDIKI (MITRE Dialogue Toolkit) software to support the multi-agent architecture and the configuration from a relational database.
  • Wizard of Oz: two Wizard of OZ interfaces were built which allow a human to interact with other agents in the dialogue architecture. The free-wizard acts as a dialogue manager and permits a chat between two humans the player and the Wizard while simultaneously storing all interactions in a database. In contrast, The semi-automatic wizard, connects the Wizard with Midiki, whereby the Wizard interprets and adjusts Midiki generation.
  • Interpretation: We trained a SVM and Logistic Regression Classifiers that assigns a user move to a player sentence.
  • Question Answer: We trained a classifier with Conditional Random Fields and a Logistic Regression classifier that chooses the most plausible response to a player sentence.
  • Generation: We implemented a generation-by-selection strategy. Given the dialog move output by the dialog manager, the generator selects any utterance in this corpus that is labeled with this dialog move for the current subdialog.

Additional Tools and Linguistic Resources:

  • Dialogue Configuration: A web tool for configuring different dialogs in a game, by configuring: the speakers(players and not player characters), the game goals and the dialogs: speakers and context goals in a dialog.
  • Annotation Tools: A web tool for annotating both player utterances with dialogue moves and system propositional questions with the related context goals (i.e.the goals to be discusse in the sub-dialog).
  • The Emospeech Corpus: A case study for the Serious Game Mission Plastechnology. Emospeech Corpus comprises 1249 dialogs, 10454 utterances and 168509 words. It contains 3609 player utterances consisting of 31613 word tokens and 2969 word types, with approximately 100 conversations for each sub-dialog in the game. Dialog length varies from 78 to 142 with an average length of 106 utterances per dialog.

Contact: Claire GARDENT

EMPATHIC (2012-2015)

The Empathic Products is an ITEA2 project, meant to develop applications that adapt to the intentional and emotional state of the user. Our role is to provide sentiment analysis and emotion detection services.

Contact: Samuel CRUZ-LARA

ISTEX (2012-2015)

The ISTEX project is financially supported by the French Ministry for Higher Education and Research (MESR) within the "Investments for the Future" program. It exploits the largest collection of plain-text scientific collections bought so far in France to diffuse it to the French academic partners. We are processing it in particular with regard to diachronic topic tracking.

Contact: Claire GARDENT

ORFEO (2013-2016)

We participated in the ORFEO project (funded by ANR) in collaboration with the Parole team to build and process large French textual and speech resources that shall be made freely available. Our role is to process these resources with different NLP components.

Contact: Christophe CERISARA