Running a climate digital twin workflow

A climate digital twin workflow entails the automation of pre- and post-processing of data, running the simulations, and orchestrating additional jobs in the workflow. To run such a workflow, multiple services provided by CSC can be utilized.

Background

  • Climate simulations take a very long time to complete making automation essential
  • The researcher wanted to run the data processing automatically so that they could evaluate the simulation output while the simulation was running without having to manually run data processing scripts. This saves them time and they do not need to wait until the full simulation is done to analyse the results.
  • The researcher also wanted to make the results available online as this makes evaluating and analyzing the simulations easier and can be shared with others.

Solution

LUMI and cPouta provide the extensive computing and storage resources required for long climate simulations.

  • Data can be shared and analyzed by other researchers by using the LUMI object storage (LUMI-O).
  • Very large jobs can be run on LUMI, making it possible to run long climate simulations that produce a lot of data.
  • The virtual machine (VM) orchestrates the automatic running of the jobs so the researcher can spend time on analyzing the results instead of manually running jobs.

Requirements

  • Required skills: Experience with bash, job schedulers (i.e. slurm), programming language for data-analysis (e.g. python, R, …), compiling large Fortran/C/C++ codes on HPC machines.
  • Orchestrating a workflow on an HPC requires:
    • A workflow for automatic job handling: In the workflow each job handles one task (e.g. running the simulation, producing plots, uploading plots)
    • A connection between the VM (virtual machine) where the workflow manager runs and the HPC machine where the actual jobs run
    • Moving the required input data to the HPC machine
1

Prerequisites: Create a CSC user account, get access to a LUMI project, apply for LUMI and cPouta resources

Start by creating a CSC user account in MyCSC, unless you already have one. Once you have a MyCSC account and a LUMI project, you must apply for cPouta resources. Note that getting access to LUMI follows a different path for the non-Finnish allocation.

How to create a new CSC user account How to add services for a project Get started with LUMI Creating a Finnish LUMI project and applying for resources More info about LUMI resources and access modes
2

Set up a virtual machine (VM)

Pouta is CSC’s virtual computing cloud where you can set up cloud VMs. In the Pouta web UI, remember to select the correct project (same as the LUMI project). You can now set up Pouta VMs that will use resources under the LUMI project. 

Pouta service web user interface How to set up and configure a VM
3

 Set up the software and data required on LUMI

Deploy your software, for instance your model code, on LUMI and move your input data for the simulations to LUMI. If you get stuck, you can contact the LUMI user support team.

Moving data to/from LUMI Deploy your software on LUMI LUMI user support team contact form
4

Set up your workflow on the virtual machine

At this point, you can install all the required software to run your workflow on your cPouta virtual machine. This may be for instance the scripts to automatically submit jobs on LUMI that you are planning to run on the VM. Establish a connection to LUMI from the VM (you need to set up the SSH keys to the VM).

Setting up the SSH keys CSC can support you in setting up your software
5

Test and run your workflow, including the simulations and automatic data-processing

To run your workflow on LUMI, you need to create a batch script. Configure the batch script to use an appropriate amount of compute resources. First run a small test with your software by creating a small experiment.

Running jobs on LUMI
6

Create a LUMI-O bucket to store data and move produced data there

 LUMI object storage (LUMI-O) is object storage service separate from the rest of LUMI, with a fast connection for data transfer. All LUMI projects have LUMI-O available by default.  Instead of directories and files, the data in object storage is organized in a flat structure with “buckets” that contain “objects”. Get access to LUMI-O, create a “bucket” to store your data, and move your data there. This step can also be included in the workflow to automatically push data to LUMI-O.

About LUMI object storage Accessing LUMI-O Managing data on LUMI-O
7

Access and share the post-processed data via LUMI-O

LUMI-O is accessible through a web interface also without connecting to LUMI, so it is easy to share your data.

Sharing data using LUMI-O