Published: April 30, 2021

Matthew Peters, Senior Research Scientist, Allen Institute for Artificial Intelligence

A guided tour of contextual word representations for language understanding

The last 3-4 years have seen a tremendous increase in the abilities of natural language understanding systems to perform tasks such as text generation, question answering, and information extraction.  These gains have been largely been driven by improvements in methods for transfer learning, where a large neural network is pretrained with a huge unannotated text corpus, and then further optimized for a target end task.  The optimization objective function used during pretraining encourages the network to encode complex characteristics of word usage and meaning into its word representations.  These representations provide a feature vector capturing the meaning of each word in its context in a way that is usable for many end applications.

In this talk, I'll provide a guided tour of these methods. I'll start with the key ideas behind the Sesame Street models: ELMo (Peters et al 2018), BERT (Devlin et al 2019), and others.  Then, we'll dive into the inner workings of these models by probing and analyzing their internal states and show they learn a surprising amount of linguistic and world knowledge (Peters et al 2018, Liu et al 2019).  I'll also describe an approach to allow sparse access to human curated knowledge (Peters et al 2019), as well as algorithmic improvements to scale them to long text documents (Beltagy et al 2020).  Finally, I'll conclude with a framework and benchmark dataset for moving beyond the current supervised learning approaches, to allow these models to generalize to unseen end tasks without any labeled data (Weller et al 2020).

BIO: Matthew Peters is a Research Scientist at the Allen Institute for AI (AI2) exploring applications of deep neural networks to fundamental questions in natural language processing.  His work has led to significant advances in a system’s ability to perform language understanding tasks such as question answering or natural language inference.  Prior to joining AI2, he was the Director of Data Science at a Seattle startup, a quantitative analyst in the finance industry, and a post-doc investigating cloud-climate feedback. He has a PhD in Applied Math from the University of Washington.