Published: Dec. 9, 2019

On December 2nd, Taylor Lab Research Affiliate Ethan Linck taught a tutorial on analyzing population genomic sequence data in Python and R for a CU Boulder BioFrontiers Institute's Quantitative Exploration and Discussion (QED) Supergroup meeting. The QED Supergroup series, which was organized around a theme of genomics and bioinformatics for fall 2019, aims to increase quantitative skills across departments at CU through workshops and data talks. In his introduction to the meeting, Ethan discussed the benefits of using a programing language to analyze sequence data, gave an overview of the ecosystem of tools available, and discussed their relative strengths and weaknesses. Participants then learned the basics of the Python library scikit-allel, including fundamental data structures, how to to load, explore, and filter a multisample .vcf file, and how calculate a set of typical summary statistics. They also learned how to plot PCA results using the R package ggplot2. All the materials from the tutorial are available here.