Storage in CSC computing environment
The computing environment of CSC enables users to manage and process large data sets. The main storage components of the CSC computing environment are:
- File system of CSC computing servers
- HPC archive storage system
- IDA storage service.
- Pouta Object Storage storage system.
- relational database service (kaivos.csc.fi)
The computing environment of CSC is mainly intended for analyzing and processing scientific data. However, the CSC servers are compatible with the Pouta object storage and IDA storage service, that can be used for data sharing and preservation.
The file system of the computing servers of CSC enable CSC customers to actively work with large data sets. The default user specific directories and their sizes are listed in the table bellow. For more information, please see the CSC Computing environment user guide.
Below is a table for the standard user directories at CSC:
|Directory or storage area|| |
|Default quota/user||Storage time||Backup |
|$HOME||Initialization scripts, source codes, small data files. Not for running programs or research data.||50 GB||Permanent||Yes|
|$USERAPPL||Users' own application software.||50 GB||Permanent||Yes|
|$WRKDIR||Temporary data storage.||5 TB||90 days||No|
|$TMPDIR||Temporary users' files.||2 days||No|
|project||Common storage for project members. A project can consist of one or more user accounts.||On request.||Permanent||No|
|HPC Archive||Long term storage.||2,5 TB||Permanent||Two copies maintained|
|Pouta Object Storage||Data storage and sharing data between services.||1 TB||Permanent||No|
HPC archive service is an iRODS based storage service for long term storage and back-up copying the data that is used in the CSC computing environment. HPC archive is accessible only from the computing environment of CSC and thus it is not good for data sharing or storing data that is produced outside CSC.
IDA Storage service
IDA storgae service is intended for stable research data, both raw data and processed datasets. IDA is visible to the internet. Note that unlike HPC archive, the IDA storage service is not automatically part of the CSC user accoun. IDA storage space should be applied from the local university.
Pouta Object Storage
Pouta Object Storage service provides a cross-platform service for storing and sharing data. Data Objects can be uploaded to this service from CSC environment as well as environments outside CSC using several different protocols and interfaces.
Relational Database Service
The MySQL database server kaivos.csc.fi is intended for tasks and applications that utilize relational database. This database server can be accessed only through the CSC computing environment and thus it is mainly used to support the data-analysis done in the servers of CSC.
The disk environment of CSC allows user to deal with very large datasets. However, transferring large datasets to CSC can take a long time. Usually transferring a 1 GB file to/from CSC computing environment should not take more than a minute or two in an organization network connected to Funet.
There are several possibilities to transfer files to and from CSC environment. For example:
- Data transport with Scientist's User Interface
- Copying files with scp program/command: from linux and Mac OSX and Windows, with WinSCP
- Using Funet FileSender to share and transport files
- Using Pouta Object Storage to share and transport files
These and other methods (such as rsync, wget and disk mounts) are described in more detail in CSC Computing Environment User Guide, Moving data between CSC and local environment.
When transferring or storing many (small) files, it is usually much easier to archive and compress the file collection into one (compressed) archive file (using commands or programs such as tar, zip or 7z) beforehand, and the decompress and extract files form the archive at destination.