Position available: Ph.D. in computer science
Position at: Université de Lorraine, LORIA laboratory, Nancy
City: Nancy, France
Subject: Weakly supervised deep learning for Natural Language Processing.
The focus of the thesis will be on weakly-supervised deep learning methods for Natural Language Processing.
Natural Language Processing is one of the next great challenge of Artificial Intelligence, with applications ranging from translation, summarization, question-answering, conversational agents, etc. and more generally building up knowledge by harvesting the huge language streams constantly published on the Internet. This thesis will study and propose novel weakly-supervised deep learning models and training methods and their application to Natural Language Processing tasks. The focus of the thesis will be on weak supervision, i.e., on desining novel training approaches that can capture generic and transferable information from raw data sources. This challenge indeed constitutes one of the main bottleneck of current and future deep learning methods, as manually annotated datasets are always too scarce and inevitably outdated and not representative of contemporary data any more even after a short amount of time. Most successful deep learning models hence rely on one of the standard approaches to compensate for this lack of data: data augmentation and transfer learning in image processing, training generic representations based on embeddings in NLP, or more generally building generative models like Variational and Generative Adversarial Networks that give access to relatively generic models of data. Several other classical approaches complete this list, including unsupervised and semi-supervised training, multi-task learning, co-training, few-shot learning, dual learning, etc.
Amongst this variety of methods, the thesis will focus on designing models that extract generic information from sparsely annotated language data. A potentially promising direction of research will be to adapt a novel unsupervised approximation of the classifier risk to achieve few-shot training of deep neural networks. Another privileged approach will deal with designing more complex embeddings models that include very-long-term memory to build up deeper contextual reading models. These models shall be applied and evaluated on standard NLP tasks as well as for anomaly detection in large unlabeled language data streams.
- Ph.D. must start as soon as possible, before March 2019
- Ph.D. will be cofounded by the PAPUD project and Région Grand-Est
- Salary: about 1700€ net, up to 2000€ with teaching.
- contact : firstname.lastname@example.org