Published: March 19, 2021

Colin Raffel; Assistant Professor of Computer Science; University of North Carolina, Chapel Hill; and Staff Research Scientist, Google Brain

T5 and large language models: The good, the bad, and the ugly

T5 and other large pre-trained language models have proven to be a crucial component of the modern natural language processing pipeline. In this talk, I will discuss the good and bad characteristics of these models through the lens of five recent papers. In the first, we empirically survey the field of transfer learning for NLP and scale up our findings to attain state-of-the-art results on many popular benchmarks. Then, I show how we can straightforwardly extend our model to be able to process text in over 100 languages. The strong performance of these models gives rise to a natural question: What kind of knowledge and skills do they pick up during pre-training? I will provide some answers by first showing that they are surprisingly good at answering trivia questions that test basic "world knowledge", but also demonstrating that they memorize non-trivial amounts of (possibly private) pre-training data, even when no overfitting is evident. Finally, I will wrap up on a sober take on recent progress to improve upon the architectures of language models.

Bio: Colin Raffel is an Assistant Professor in the Department of Computer Science at the University of North Carolina, Chapel Hill. He also spends one day a week as a Staff Research Scientist at Google Brain. He obtained his PhD in Electrical Engineering from Columbia University in 2016 under the supervision of Daniel P. W. Ellis.