Large Data Processing in R Workshop (University-Wide)

R is a statistical programming language commonly used in many different academic disciplines, including the hard and social sciences. The open-source community has developed numerous packages for the language, enabling users to easily implement statistical methods that would require significant development in other languages. However, R has some performance limitations, especially when working with data that struggles to fit in memory. In this workshop, we will explore techniques (such as streaming and sharding data) and tools (the data.table package) for working with data that approaches or exceeds computer memory limits.

  • Audience: University-wide
  • Pre-Requisites: Basic familiarity with R, familiarity with the tidyverse packages preferred.
    Participants should be comfortable with working on the command line.
  • Cost: None
  • Late Drop/Cancel Fee: None
Additional Information: We may use the FAS Research Computing cluster for optimization benchmarking. However, by using a laptop/workstation, you should be able to follow the course materials in the class. If you’d like to get a head start on the set up for class, we recommend ensuring that you have the “data.table” and “chunked” R packages installed.

Registration and details on the Harvard Training Portal: REGISTER

  • 00


  • 00


  • 00


  • 00



Apr 14 2021


2:00 pm - 4:00 pm
QR Code
Next monthly maintenance May 3rd 7am-11am [Details]
STATUS PAGE Some systems are experiencing issues.