Use Case | 27.1.2025

Data lifecycle at CSC – from collection to preservation

This example research project collects data from sensors to a data storage service. The researches will collect about 100 TiB of data and the data will be processed in batches to create a 10 TiB summary dataset that is to be published online as dynamic data following FAIR principles. After publishing the summary dataset, the research group wants to preserve it.

This use case can be accomplished by using several CSC services together:

Create a CSC user account and project

A CSC user account and project are required for using the following CSC services. The project manager invites other members of the research group to the project after they have created their own personal CSC accounts.

How to create a new CSC user account How to create a CSC project How to add members to project Link to My CSC portal

cPouta: Opening and sharing data in the cloud

A virtual machine (VM) running in cPouta cloud is used as the platform to which the sensors send their data. Another VM can be set up to allow external users to access the collected data and some lightweight analysis tools.

Allas: Collecting sensor data for temporary storage

The data sent to cPouta is pre-processed and then stored in Allas. Allas is recommended for data storage during the active phase of a research project. Allas has a high storage capacity and is accessible from anywhere on the internet. The data can have different levels of access control.

Puhti: Data processing in CSC’s supercomputing environment

The data is copied from Allas to Puhti supercomputer for processing that requires considerable computing power (e.g. high memory, parallel CPU resources, GPUs). The computations are run as batch jobs, and afterwards the results are uploaded back to Allas.

Read more about Puhti Getting started with supercomputing at CSC CSC supercomputing user guide

Fairdata services: Publication and preseravation

The processed dataset can be transferred to Fairdata IDA directly from Puhti. Metadata and a persistent identifier (DOI) are created for the dataset using Qvain. Etsin provides a landing page and download links for the data. Data published in Fairdata services can be later transferred into digital preservation service for research data (requires a contract).

Read more about Fairdata services How to apply for IDA storage space