High Performance R
Are you using R but wondering if your R code makes the best use of the computing resources available? Would you like to learn to speed up R analyses by parallel computing, identify bottlenecks in your R scripts, or get tips on handling large datasets in R? Join our course that focuses on using R efficiently and making most of R in a high performance computing environment.
Topics of the course include:
- Making use of the properties of R as a programming language to write efficient R code
- Exploring performance issues of R code by benchmarking and profiling processes and memory usage
- Parallel and distributed computing with R on both local and supercomputing resources
The topics will be covered using short lectures and/or demonstrations followed by hands-on exercises using RStudio and batch jobs on the supercomputer Puhti.
Advanced participants are welcome to bring their own R code (short script section you are already familiar with) and a small data set (maximum 5 GB) to go with the script to be used in the some of the exercises. We expect that you are able to read the data into R and run the script independently.
Target audience:
This course is meant for anyone familiar with the basics of R and wanting to learn how to make their analyses in R more efficient and how to use R in a high performance computing environment. For example:
- Current users of RStudio in CSC’s Puhti web interface: move beyond RStudio and make most of the computing resources of the supercomputer
- R users running R on their own computer so far: use your computer’s resources efficiently and learn to use R in a high performance computing environment
- Experienced users of another programming language and/or high performance computing: get familiar with the functional nature of the R language and its resource management
Where & when:
This is a two-day course from 9:00 to 16:00. The course will be offered on-site at the CSC Training Facilities (Keilaranta 14, Espoo, Finland) and online. For the best experience and if you anticipate needing a lot of support with the course exercises (see the pre-requisites below), we recommend on-site participation. For participants joining the course on site in Espoo, lunch and a snack are included in the price.
Learning outcomes:
After attending this course, participants will be able to:
- Explore potential R code performance issues with benchmarking and profiling’
- Inderstand the key properties of the R language and how they relate to the computer’s resource management
- Run R scripts with the batch job system on the supercomputer Puhti
- Get started with parallel and distributed computing with R
Pre-requisites:
Required:
- Basics of R
‘Basics of R’ can mean many things, but if the following things are familiar to you, you should be good to go: basic R syntax, running and writing R code, data structures and types, using packages, using variables, functions and pipes, and basic operations for data wrangling. If you know how to write a for loop or a function in R, you will definitely be fine!
If you are a complete beginner with R and programming in general, we recommend the course Data Analysis with R instead.
Useful to make the most of the course content but not required:
- Basics of Linux, for example:
- Some experience in using a supercomputer, for example:
- using the RStudio in the Puhti web interface
- the course CSC Computing Environment, Part 1: Basics, or the corresponding self-learning materials
Lecturers:
Heli Juottonen and Maciej Janicki (CSC)
Registration deadline: 9.11.2025