Storage in CSC computing environment

The computing environment of CSC enables users to manage and process large data sets. The main storage components of the CSC computing environment are:

  • File system of CSC computing servers
  • HPC archive storage system
  • IDA storage service.
  • Pouta Object Storage storage system.
  • relational database service (kaivos.csc.fi)

The computing environment of CSC is mainly intended for analyzing and processing scientific data. However, the CSC servers are compatible with the Pouta object storage and IDA storage service, that can be used for data sharing and preservation.

File System

The file system of the computing servers of CSC enable CSC customers to actively work with large data sets. The default user specific directories and their sizes are listed in the table bellow. For more information, please see the CSC Computing environment user guide.

Below is a table for the standard user directories at CSC:

Directory or storage area

Intended use

Default quota/user Storage time Backup
 
$HOME Initialization scripts, source codes, small data files. Not for running programs or research data. 50 GB Permanent Yes
$USERAPPL Users' own application software. 50 GB Permanent Yes
$WRKDIR Temporary data storage. 5 TB 90 days No
$TMPDIR Temporary users' files.   2 days No
project Common storage for project members. A project can consist of one or more user accounts. On request. Permanent No
HPC Archive Long term storage. 2,5 TB Permanent Two copies maintained
Pouta Object Storage Data storage and sharing data between services. 1 TB Permanent No

HPC archive

HPC archive service is an iRODS based storage service for long term storage and back-up copying the data that is used in the CSC computing environment. HPC archive is accessible only from the computing environment of CSC and thus it is not good for data sharing or storing data that is produced outside CSC.

 

IDA Storage service

IDA storgae service is intended for stable research data, both raw data and processed datasets. IDA  is visible to the internet. Note that unlike HPC archive, the IDA storage service is not automatically part of the CSC user accoun. IDA storage space should be applied from the local university.

 

Pouta Object Storage

Pouta Object Storage service provides a cross-platform service for storing and sharing data. Data Objects can be uploaded to this service from CSC environment as well as environments outside CSC using several different protocols and interfaces.

 

Relational Database Service

The MySQL database server kaivos.csc.fi is intended for tasks and applications that utilize relational database. This database server can be accessed only through the CSC computing environment and thus it is mainly used to support the data-analysis done in the servers of CSC.

File Transfer

The disk environment of CSC allows user to deal with very large datasets. However, transferring large datasets to CSC can take a long time. Usually transferring a 1 GB file to/from CSC computing environment should not take more than a minute or two in an organization network connected to Funet.

There are several possibilities to transfer files to and from CSC environment. For example:

These and other methods (such as rsync, wget and disk mounts) are described in more detail in CSC Computing Environment User Guide, Moving data between CSC and local environment.

When transferring or storing many (small) files, it is usually much easier to archive and compress the file collection into one (compressed) archive file (using commands or programs such as tar, zip or 7z) beforehand, and the decompress and extract files form the archive at destination.