Natural Language Processing (NLP)

WORKING WITH TEXT

Training description

This custom training covers a broad range of topics related to Natural Language Processing (NLP). Due to the complexity of this topic, the exact scope is always different depending on the client needs. The topics range varies from an introduction to methods of working with text in Python up to utilising state of the art Deep Learning methods for various NLP tasks.

Duration: 3-7 days (depending on the exact scope)

Training agenda

Part one: Introduction

Loading text data. (Pandas, Python files API)
Basic string operations
Regular expressions
Crawling basics (Selenium)

Part two: Preprocessing

Data cleanup (beautifulsoup)
Normalization
- Stemming
- Lemmatization
- Stop words removal
Segmentation
Tokenization
- Using basic string operations
- With NLTK & Spacy
- SentencePiece

Part three: Vectorization

Bag of words
- Simple custom implementation
- With scikit-learn
TFIDF
- Custom implementation
- With scikit-learn
Dense word representations
- word2vec
- doc2vec
- fastText
Introduction to contextual word representations

Part four: Text-based models

Similarity-based models
- Anomalies detection via clustering
- Categories/tags assignment via k-NN
Introduction to deep learning on classification tasks
- MLP + tf-idf
- LSTM/GRU + vector sequences
- CNN + vector sequences
Sentiment analysis
Part-of-speech tagging
- Using out-of-the-box models
- Fine-tuning POS models
BERT

Technologies used on the training:

Primary: Python, NLTK/Spacy, PyTorch/Keras+ TensorFlow
Secondary: Selenium, BeautifulSoup
Optional: Gensim, Flair, BERT, Polyglot, fastText