blog-image

Processing texts

In this class, we will learn how to enrich text with linguistic knowledge (postags, syntactic structure…) using NLTK (Natural Language Toolkit), SPacy and Stanford CoreNLP. We will also look at some standard pre-processing operations (lowercasing, punctuation removal) which are frequently used to normalize textual data.

  • Pre-processing
  • Tokenization and Sentence splitting
  • Part-of-speech (POS) tagging
  • Morphological analysis, Stemming and Lemmatization
  • Stop words recognition
  • Named Entity Recognition (NER)
  • Constituency/dependency parsing