Large Data Processing in R Workshop (University-Wide)
R is a statistical programming language commonly used in many different academic disciplines, including the hard and social sciences. The open-source community has developed numerous packages for the language, enabling users to easily implement statistical methods that would require significant development in other languages. However, R has some performance limitations, especially when working with data that struggles to fit in memory. In this workshop, we will explore techniques (such as streaming and sharding data) and tools (the data.table package) for working with data that approaches or exceeds computer memory limits.
- Audience: University-wide
- Pre-Requisites: Basic familiarity with R, familiarity with the tidyverse packages preferred.
Participants should be comfortable with working on the command line.
- Cost: None
- Late Drop/Cancel Fee: None
Registration and details on the Harvard Training Portal: REGISTER