3.2. Archiving data to the HPC archive and IDA storage services

CSC supports two parallel archiving systems for long term data storage:

  1. HPC archive is intended to storing datasets that are utilized in CSC computing environment.

  2. IDA storage service  is a general storage service for scientific data.

The main difference between between these two services is in their user policy and in the accessibility. The HPC archive is directly bound to the CSC user accounts: All the customers of the CSC computing environment will automatically have an account with 2 TB quota in the HPC archive.

The IDA service is not directly linked to the CSC computing environment. Even though CSC hosts the IDA service and users need to register to CSC, the storage space is applied from the universities or from the Academy of Finland. IDA users can use the storage space from both their own computers and from the servers of CSC. Thus IDA can be used for transporting data between CSC and local environment. IDA can also be used to publish or share datasets. More information about applying storage space from IDA can be found from the home page of IDA:

The usage of HPC archive and  IDA is based on client programs. IDA uses in-house developed client tool (ida)., while HPC-archive service uses iRODS (Integrated Rule-Oriented Data System) client.

The files that are in these storage systems can be managed through client interfaces but the content of the archived files can't be studied or modified. In stead, the stored file must be first retrieved back to the CSC servers or to some other computer in order to analyse or modify the dataset.

3.2.1 Using HPC archive

HPC archive service is based on iRODS technology. In Taito cluster the iRODS commands are automatically in use. In Sisu you need to run following set-up command in order to be able to execute iRODS commands:

module load irods

(Note that if you are using HPC arcive in Taito, you should not run the module load irods commad as it loads an iRODS version that is not compatilble with HPC archive).  The two basic iRODS commands are:

  • iput that copies a file to the iRODS server
  • iget that retrieves a file from the iRODS server

In addition to that, there are several other iRODS commands that can be used to manage the data at the archive server. Many of these i-commands, listed in table 3.1 are analogous to the corresponding linux commands. E.g. Command irm removes a file from the iRODS server and imkdir creates a new directory to the iRODS server.

We recommend that you don't store all the data to the main folder of the server, but instead you should create a hierarchical directory structure that helps you to locate your files later on. Further, if possible, the files should merged into larger compressed archiving units with programs like tar or zip before moving data to the HPC archive or IDA.

Table 3.1 Most commonly used iRODS commands.

Command Function
icd Change the current working directory (collection).
ichksum Calculate checksum for one or more data-object or collections.
ichmod Change access permissions to collections or data-objects
icp Copy a data-object (file) or collection (directory) to another.
iexit Exit an irods session (un-iinit).
iget Get a file from iRODS.
igetwild.sh Get one or more files from iRODS using wildcard characters.
ihelp Display a synopsis list of the i-commands
iinit Initialize a session, so you don't need to retype your password.
ilocate Search for data-object(s) OR collections (via a script).
ils List collections (directories) and data-objects (files).
imkdir Make an irods directory (collection).
imv Move/rename an irods data-object (file) or collection (directory).
ipasswd Change your irods password.
iput Put (store) a file into iRODS.
ipwd Print the current working directory (collection) name.
iquota Show information on iRODS quotas (if any).
irm Remove one or more data-objects or collections.

Synchronize collections between a local/irods or irods/irods (at the moment this command is not working properly. We recomment not to use it)


Example 1. Storing data from within Sisu to the HPC archive server

In this example, user kkayttaj copies a set of files from his $WRKDIR directory in Sisu, to his HPC Archive directory.
After logging into Sisu the user sets up the iRODS commands and moves to the work directory of Sisu:

[kkayttaj@c305 ~]$ module load irods
[kkayttaj@c305 ~]$ cd $WRKDIR

In the case of operating formwithin Taito, skip the module-command (see above).

Then the user checks the content of the directory with command ls and creates a new directory called: proj27_data_1.

[kkayttaj@c305 kkayttaj]$ ls
images27_a.jpg images27_b.jpg  images27_c.jpg  input27.dat  result27_a.out
result27_b.out  result27_c.out
[kkayttaj@c305 kkayttaj]$  mkdir proj27_data_1
Then the user copies the files he wants to preserve to the new directory:
[kkayttaj@c305 kkayttaj]$  cp input27.dat proj27_data_1
[kkayttaj@c305 kkayttaj]$  cp result27*.out proj27_data_1
[kkayttaj@c305 kkayttaj]$  cp images27*.jpg  proj27_data_1
After that the user checks that the new directory contains all the files that you wish to store to archive.
[kkayttaj@c305 kkayttaj]$ ls proj27_data_1
images27_a.jpg images27_b.jpg  images27_c.jpg  input27.dat  result27_a.out
result27_b.out  result27_c.out
Next, the data to be stored is collected to a compressed tar archive file called proj27_data_1.tgz.
[kkayttaj@c305 kkayttaj]$ tar zcvf proj27_data_1.tgz proj27_data_1
The resulting compressed file proj27_data_1.tgz can now be copied to the HPC archive. Before copying the data,the user first creates a new sub-folder called proj27 to the IDA server.
[kkayttaj@c305 kkayttaj]$ imkdir proj27
Next the user checks that the directory was created to the HPC archive server and changes the current HPC archive server directory as the new proj27 directory:
[kkayttaj@c305 kkayttaj]$ ils
/hpc_archive/home/kkayttaj:   C- /hpc_archive/home/kkayttajl/proj27
[kkayttaj@c305 kkayttaj]$ icd proj27
After this the user is ready to execute iput command that copies the file to to the new directory in the HPC archive server.
[kkayttaj@c305 kkayttaj]$ iput proj27_data_1.tgz
Once the data copying process is finished, the user checks that the file has been successfully copied to the archive:
[kkayttaj@c305 kkayttaj]$ ils -l
  kkayttaj            0 disk-1.4               1344214352 2013-03-25.13:15 & proj27_data_1.tgz

If you want to be certain, that the transfer has been completely successful, you can run the checksum-commands for both local copy (md5sum) and the irods-copy (ichksum) and verify that the checksums match:

[kkayttaj@c305 kkayttaj]$ ichksum proj27_data_1.tgz
    proj27_data_1.tgz                     24eeb2845cbfda238b78fa165c21607d
Total checksum performed = 1, Failed checksum = 0

[kkayttaj@c305 kkayttaj]$ md5sum proj27_data_1.tgz
24eeb2845cbfda238b78fa165c21607d proj27_data_1.tgz
Once the flles are succesfully archived, files in directory proj27_data_1 and file proj27_data_1.tgz can be removed from the local $WRKDIR
[kkayttaj@c305 kkayttaj]$ rm proj27_data_1.tgz
[kkayttaj@c305 kkayttaj]$ rm -r proj27_data_1
[kkayttaj@c305 kkayttaj]$ rm input27.dat
[kkayttaj@c305 kkayttaj]$ rm result27*.out
[kkayttaj@c305 kkayttaj]$ rm images27*.jpg


Example 2. Retrieving data from the archive server on Sisu

To retrieve the data, stored to HPC Archive in the previous example, the user kkayttaj should do following steps. First the compressed file is copied from the HPC Archive to the $WRKDIR directory.
kkayttaj@sisu-login5:/wrk/kkayttaj>module load irods
kkayttaj@sisu-login5:>cd $WRKDIR
/hpc_archive/home/kkaytaj: C- /hpc_archive/home/kkayttaj/proj27
kkayttaj@sisu-login5:/wrk/kkaytaj>icd proj27
/hpc_archive/home/kkayttaj/proj27: proj27_data_1.tgz
kkayttaj@sisu-login5:/wrk/kkaytaj> iget proj27_data_1.tgz
Then decompress and unpack the data
kkayttaj@sisu-login5:/wrk/kkaytaj> tar zxvf proj27_data_1.tgz

After these commands the $WRKDIR directory will include directory proj27_data_1 that contains the files stored to the HPC-Archive service.

Example 3. Retrieving data from the old archive directory

The HPC archive system was taken in use in 2013. Users that used the archive service of CSC before the current HPC archive system can access the data sored to the older system by HPC archive path:


For example, user kkauttaj could retrieve files, stored to the old systrem, with commands:

[kkayttaj@taito-login3 ~]$ ils /hpc_archive/old_archive/kkayttaj
 [kkayttaj@taito-login3 ~]$ iget /hpc_archive/old_archive/kkayttaj/old_project.tgz


3.2.2 Configuring the connection to IDA

IDA service was completely reconstructed in 2018 and the previous, iRODS based IDA is no longer in use. The first version of the new command line interfeace of IDA, ida, is available on Taito. At the moment the IDA client enables data upload and download, between Taito and IDA. You can also delete and move files inside IDA. File listing is not yet available. You should also note that the client operates only in the staging area of IDA and not with the frozen data.

Before you start using IDA client in Taito you must set up your IDA connection by running command:

The configuration process asks for your CSC project number and application password. This information can be obtained from the personal information page of the IDA WWW-interface.

Once you have conifgured the connection, you can upload and download files between Taito and IDA with  commands:

ida upload target_in_ida local_file
ida download target_in_ida local_file 

More information about using and configuring IDA client can be found from https://github.com/CSCfi/ida2-command-line-tools



3.2.3 Using HPC Archive with Scientist's User Interface

CSC's archiving system, HPC archive, can  be accessed via the Scientist's User Interface web portal by using portal's My Files tool (https://sui.csc.fi/group/sui/my-files). Intructions for data management with My Files, please see chapter 5.1.


  Previous chapter     One level up     Next chapter