DTSA 5747: Fundamentals of Natural Language Processing
- Specialization: Natural Language Processing: Deep Learning Meets Linguistics
- Instructor: Jim Martin
- Prior knowledge needed: Learners should be proficient in Python programming including the use of packages such as numpy, scikit-learn and pandas. Students should be proficient in data structures and basic topics in algorithm design, such as sorting and searching, dynamic programming, and algorithm analysis. Students should also have basic familiarity with introductory concepts from calculus, discrete probability, and linear algebra.
Learning Outcomes
- Analyze corpora for the purpose of developing effective lexicons.
- Develop language models that can assign probabilities to texts.
- Design, implement, and evaluate the effectiveness of text classifiers.
- Design, implement, and evaluate effective sequence labeling systems.
- Design and interpret vector-based systems for capturing word meanings.
Course Content
Duration: 4h
This first module of Fundamentals of Natural Language Processing introduces the fundamental concepts of natural language processing (NLP), focusing on how computers process and analyze human language. You will explore key linguistic structures, including words and morphology, and learn essential techniques for text normalization and tokenization.
Duration: 6h
This module explores foundational language modeling techniques, focusing on n-gram models and their role in statistical Natural Language Processing. You will learn how n-gram language models are constructed, smoothed, and evaluated for effectiveness.
Duration: 7h
This module introduces text classification and explores logistic regression as a powerful classification technique. You will learn how logistic regression models work, including key mathematical concepts such as the logit function, gradients, and stochastic gradient descent. The week also covers evaluation metrics for assessing classifier performance.
Duration: 7h
This final module explores how words can be represented as vectors in a high-dimensional space, allowing computational models to capture semantic relationships between words. You will learn about both sparse and dense vector representations, including TF-IDF, Pointwise Mutual Information (PMI), Latent Semantic Analysis (LSA), and Word2Vec. The module also covers techniques for evaluating and applying word embeddings.
Duration: 1h
TBD
Note: This page is periodically updated. Course information on the Coursera platform supersedes the information on this page. Click View on Coursera button above for the most up-to-date information.