WORKING WITH TEXT
Natural Language Processing (NLP)
Training description
This custom training covers a broad range of topics related to Natural Language Processing (NLP). Due to the complexity of this topic, the exact scope is always different depending on the client needs. The topics range varies from an introduction to methods of working with text in Python up to utilising state of the art Deep Learning methods for various NLP tasks.
Duration: 3-7 days (depending on the exact scope)
Training agenda
Part one: Introduction
- Loading text data. (Pandas, Python files API)
- Basic string operations
- Regular expressions
- Crawling basics (Selenium)
Part two: Preprocessing
- Data cleanup (beautifulsoup)
- Normalization
- Stemming
- Lemmatization
- Stop words removal
- Segmentation
- Tokenization
- Using basic string operations
- With NLTK & Spacy
- SentencePiece
Part three: Vectorization
- Bag of words
- Simple custom implementation
- With scikit-learn
- TFIDF
- Custom implementation
- With scikit-learn
- Dense word representations
- word2vec
- doc2vec
- fastText
- Introduction to contextual word representations
Part four: Text-based models
- Similarity-based models
- Anomalies detection via clustering
- Categories/tags assignment via k-NN
- Introduction to deep learning on classification tasks
- MLP + tf-idf
- LSTM/GRU + vector sequences
- CNN + vector sequences
- Sentiment analysis
- Part-of-speech tagging
- Using out-of-the-box models
- Fine-tuning POS models
- BERT
Technologies used on the training:
- Primary: Python, NLTK/Spacy, PyTorch/Keras+ TensorFlow
- Secondary: Selenium, BeautifulSoup
- Optional: Gensim, Flair, BERT, Polyglot, fastText
Contact us about closed training