PhD position available

PhD position available

Ph.D position available at LORIA laboratory in Nancy, France, in the team Synalp on deep learning for NLP; starting as soon as possible.

Title: Weakly supervised deep learning for natural language processing

This thesis will study and propose novel weakly-supervised deep learning models and training methods and their application to Natural Language Processing (NLP) tasks. The focus of the thesis will be on weak supervision, i.e., on desining novel training approaches that can capture generic and transferable information from raw data sources. This challenge indeed constitutes one of the main bottleneck of current and future deep learning methods, as manually annotated datasets are always too scarce and inevitably outdated and not representative of contemporary data any more even after a short amount of time. Most successful deep learning models hence rely on one of the standard approaches to compensate for this lack of data: data augmentation and transfer learning in image processing, training generic representations based on embeddings in NLP, or more generally building generative models like Variational and Generative Adversarial Networks that give access to relatively generic models of data. Several other classical approaches complete this list, including unsupervised and semi-supervised training, multi-task learning, co-training, few-shot learning, dual learning, etc.

Amongst this variety of methods, the thesis will focus on designing models that extract generic information from sparsely annotated language data. A potentially promising direction of research will be to adapt a novel unsupervised approximation of the classifier risk to achieve few-shot training of deep neural networks. Another privileged approach will deal with designing more complex embeddings models that include very-long-term memory to build up deeper contextual reading models. These models shall be applied and evaluated on standard NLP tasks as well as for anomaly detection in large unlabeled language data streams.

Contact: Christophe Cerisara,