Data Storage Services

CSC has multiple options for storing data while you work on it and later on during the data lifecycle.

The storage solutions and services listed on this page are mostly offered free of charge for academic research, education and training purposes in Finnish higher education institutions and in state research institutes (subsidized by the Ministry of Education and Culture, Finland).

Common options: Allas and Fairdata IDA

Allas and Fairdata IDA are both suitable for storing even terabytes of data, when required. They are both used with CSC accounts and the storage space is shared by a project group, but they also have differences. Allas is suitable especially for storing data that needs to be available for analysis, and IDA meets your needs when you wish to publish your stored data for others to see and use, and get a persistent identifier for the published dataset. You may want to use both during the project's data lifecycle.

In both services the data cannot be modified while it is in the service. Data must be downloaded to some other server for processing, and the previous version replaced with a new one.

Allas

Allas is CSC's general purpose research data storage. It has a fast connection to CSC servers and it can be used as well from anywhere on the internet. Allas can be used both for static research data that needs to be available for analysis and to collect and host cumulating or changing data. In Allas the imported data is stored as static objects (object storage). Allas can be used with S3 or Swift API compatible tools (e.g. WWW, command line, programmatic and graphical interfaces) and the uploaded data can be made accessible from the Internet. Allas is based on CEPH which makes it technically reliable. At this stage Allas provides only basic object storage functionality and more advanced tools, such as automatic data backups, are not included in the service.

Read more about Allas in CSC Docs »

Fairdata IDA

Fairdata IDA (ida.fairdata.fi) enables saving, organizing and sharing data within the project group and storing the data in an immutable state. IDA can be used with a user-friendly browser user interface or with command line tools. IDA is meant for storing stable research data, which can be constructed and described as research datasets. The published research dataset gets a persistent identifier (DOI or URN) and a landing page in Etsin. This makes the dataset findable for others, and enables re-use of the data and creating a scientific reference. The actual data (files) in the published dataset can be set as openly downloadable by others, but it's also possible to publish only the metadata of the dataset.

Read more about Fairdata IDA »

Working storage for active data

There are multiple options for storing your data while you work on it.

  • The storage environment of CSC supercomputers (Puhti and Mahti) allows researchers to do high performance computing for large datasets. The storage areas of Puhti and Mahti are intended only for data that is in active use. Files that have not been modified for longer than 90 days will be automatically removed.
  • Storage in cloud environments. The cloud environments of CSC (cPouta and ePouta) provide storage space that can be linked to virtual machines running in these cloud environments. This allows CSC customers to build their own data servers and analysis environments, that can also made accessible to the Internet.
  • Databases for research are available for computing projects by CSC users. Access to the databases is through the CSC computing environment. Databases are also provisioned separately, contact Service Desk for more information.

Other services for storing stable data

  • EUDAT B2SHARE is a storage and sharing service for openly licensed research data in European collaboration. Read more about services for sharing data.
  • Fairdata PAS is a digital preservation service for research data. Digital preservation refers to the reliable preservation of digital information for several decades or even centuries. The Digital Preservation Service for Research Data is meant for the digital preservation of research datasets (data, publications, code, learning materials etc.).

 

Which storage solution is right for you?

Researchers get access to multiple storage options through CSC, see below for a comparison of available services. We also provision storage capacity (CEPH/NFS) on request. Services suitable for sensitive data are being developed. We recommend creating a data management plan when concidering data storage options. Our Service Desk provides personal guidance and expert support in choosing the right storage solution for your data: servicedesk@csc.fi

  Intended purpose Currently available quotas* Interfaces Single user or project based access Additional features Service offered by
Project directories in Puhti supercomputer Disk areas for processing data 50 GB,
1 TB short term (more on request)
File system

project group

  CSC
Allas Object Storage Platform independent data storage and sharing 10 TB (more on request)

S3 and Swift clients. OpenStack Horizon web interface.

project group Enables sharing data from the service CSC
Storage in CSC cloud environments Temporary or persistent storage resources via virtual machines 1 TB (more on request) Block storage via virtual machine, big data frameworks (Hadoop, Spark) project group   CSC
Fairdata IDA Storage service Storing and sharing stable data Granted based on application by organization (from 1 GB to 100 TB) Browser, CLI project group Enables sharing data from the service, persistent identifiers  MINEDU (service produced by CSC)
EUDAT B2SHARE   Storing and publishing small-scale data Limited space by file (10 GB) and record (20 GB) Browser, REST API project group Enables metadata and sharing data from the service EUDAT
CSC databases for research Data for applications utilizing relational databases Up to tens of GBs MariaDB database project group   CSC

* For more detailed information, see the table about default quotas.