3.2. Archiving data to the HPC archive and IDA storage services
CSC supports two parallel archiving systems for long term data storage:
HPC archive is intended to storing datasets that are utilized in CSC computing environment.
IDA storage service is a general storage service for scientific data.
The main difference between between these two services is in their user policy and in the accessibility. The HPC archive is directly bound to the CSC user accounts: All the customers of the CSC computing environment will automatically have an account with 2 TB quota in the HPC archive.
The IDA service is not directly linked to the CSC computing environment. Even though CSC hosts the IDA service and users need to register to CSC, the storage space is applied from the universities or from the Academy of Finland. IDA users can use the storage space from both their own computers and from the servers of CSC. Thus IDA can be used for transporting data between CSC and local environment. IDA can also be used to publish or share datasets. More information about applying storage space from IDA can be found from the home page of IDA:
The usage of HPC archive and IDA is based on client programs. IDA uses in-house developed client tool (ida)., while HPC-archive service uses iRODS (Integrated Rule-Oriented Data System) client.
The files that are in these storage systems can be managed through client interfaces but the content of the archived files can't be studied or modified. In stead, the stored file must be first retrieved back to the CSC servers or to some other computer in order to analyse or modify the dataset.
3.2.1 Using HPC archive
HPC archive service is based on iRODS technology. In Taito cluster the iRODS commands are automatically in use. In Sisu you need to run following set-up command in order to be able to execute iRODS commands:
module load irods
(Note that if you are using HPC arcive in Taito, you should not run the module load irods commad as it loads an iRODS version that is not compatilble with HPC archive). The two basic iRODS commands are:
In addition to that, there are several other iRODS commands that can be used to manage the data at the archive server. Many of these i-commands, listed in table 3.1 are analogous to the corresponding linux commands. E.g. Command irm removes a file from the iRODS server and imkdir creates a new directory to the iRODS server.
We recommend that you don't store all the data to the main folder of the server, but instead you should create a hierarchical directory structure that helps you to locate your files later on. Further, if possible, the files should merged into larger compressed archiving units with programs like tar or zip before moving data to the HPC archive or IDA.
Table 3.1 Most commonly used iRODS commands.
|icd||Change the current working directory (collection).|
|ichksum||Calculate checksum for one or more data-object or collections.|
|ichmod||Change access permissions to collections or data-objects|
|icp||Copy a data-object (file) or collection (directory) to another.|
|iexit||Exit an irods session (un-iinit).|
|iget||Get a file from iRODS.|
|igetwild.sh||Get one or more files from iRODS using wildcard characters.|
|ihelp||Display a synopsis list of the i-commands|
|iinit||Initialize a session, so you don't need to retype your password.|
|ilocate||Search for data-object(s) OR collections (via a script).|
|ils||List collections (directories) and data-objects (files).|
|imkdir||Make an irods directory (collection).|
|imv||Move/rename an irods data-object (file) or collection (directory).|
|ipasswd||Change your irods password.|
|iput||Put (store) a file into iRODS.|
|ipwd||Print the current working directory (collection) name.|
|iquota||Show information on iRODS quotas (if any).|
|irm||Remove one or more data-objects or collections.|
Synchronize collections between a local/irods or irods/irods (at the moment this command is not working properly. We recomment not to use it)
In this example, user kkayttaj copies a set of files from his $WRKDIR directory in Sisu, to his HPC Archive directory.
After logging into Sisu the user sets up the iRODS commands and moves to the work directory of Sisu:
[kkayttaj@c305 ~]$ module load irods [kkayttaj@c305 ~]$ cd $WRKDIR
In the case of operating formwithin Taito, skip the module-command (see above).
Then the user checks the content of the directory with command ls and creates a new directory called: proj27_data_1.
[kkayttaj@c305 kkayttaj]$ ls images27_a.jpg images27_b.jpg images27_c.jpg input27.dat result27_a.out result27_b.out result27_c.out [kkayttaj@c305 kkayttaj]$ mkdir proj27_data_1Then the user copies the files he wants to preserve to the new directory:
[kkayttaj@c305 kkayttaj]$ cp input27.dat proj27_data_1 [kkayttaj@c305 kkayttaj]$ cp result27*.out proj27_data_1 [kkayttaj@c305 kkayttaj]$ cp images27*.jpg proj27_data_1After that the user checks that the new directory contains all the files that you wish to store to archive.
[kkayttaj@c305 kkayttaj]$ ls proj27_data_1 images27_a.jpg images27_b.jpg images27_c.jpg input27.dat result27_a.out result27_b.out result27_c.outNext, the data to be stored is collected to a compressed tar archive file called proj27_data_1.tgz.
[kkayttaj@c305 kkayttaj]$ tar zcvf proj27_data_1.tgz proj27_data_1The resulting compressed file proj27_data_1.tgz can now be copied to the HPC archive. Before copying the data,the user first creates a new sub-folder called proj27 to the IDA server.
[kkayttaj@c305 kkayttaj]$ imkdir proj27Next the user checks that the directory was created to the HPC archive server and changes the current HPC archive server directory as the new proj27 directory:
[kkayttaj@c305 kkayttaj]$ ils /hpc_archive/home/kkayttaj: C- /hpc_archive/home/kkayttajl/proj27
[kkayttaj@c305 kkayttaj]$ icd proj27After this the user is ready to execute iput command that copies the file to to the new directory in the HPC archive server.
[kkayttaj@c305 kkayttaj]$ iput proj27_data_1.tgzOnce the data copying process is finished, the user checks that the file has been successfully copied to the archive:
[kkayttaj@c305 kkayttaj]$ ils -l /hpc_archive/home/kkayttaj/proj27 kkayttaj 0 disk-1.4 1344214352 2013-03-25.13:15 & proj27_data_1.tgz
If you want to be certain, that the transfer has been completely successful, you can run the checksum-commands for both local copy (md5sum) and the irods-copy (ichksum) and verify that the checksums match:
[kkayttaj@c305 kkayttaj]$ ichksum proj27_data_1.tgz proj27_data_1.tgz 24eeb2845cbfda238b78fa165c21607d Total checksum performed = 1, Failed checksum = 0 [kkayttaj@c305 kkayttaj]$ md5sum proj27_data_1.tgz 24eeb2845cbfda238b78fa165c21607d proj27_data_1.tgzOnce the flles are succesfully archived, files in directory proj27_data_1 and file proj27_data_1.tgz can be removed from the local $WRKDIR
[kkayttaj@c305 kkayttaj]$ rm proj27_data_1.tgz [kkayttaj@c305 kkayttaj]$ rm -r proj27_data_1 [kkayttaj@c305 kkayttaj]$ rm input27.dat [kkayttaj@c305 kkayttaj]$ rm result27*.out [kkayttaj@c305 kkayttaj]$ rm images27*.jpg
kkayttaj@sisu-login5:/wrk/kkayttaj>module load irods kkayttaj@sisu-login5:>cd $WRKDIR kkayttaj@sisu-login5:/wrk/kkaytaj>ils /hpc_archive/home/kkaytaj: C- /hpc_archive/home/kkayttaj/proj27 kkayttaj@sisu-login5:/wrk/kkaytaj>icd proj27 kkayttaj@sisu-login5:/wrk/kkaytaj>ils /hpc_archive/home/kkayttaj/proj27: proj27_data_1.tgz kkayttaj@sisu-login5:/wrk/kkaytaj> iget proj27_data_1.tgzThen decompress and unpack the data
kkayttaj@sisu-login5:/wrk/kkaytaj> tar zxvf proj27_data_1.tgz
After these commands the $WRKDIR directory will include directory proj27_data_1 that contains the files stored to the HPC-Archive service.
Example 3. Retrieving data from the old archive directory
The HPC archive system was taken in use in 2013. Users that used the archive service of CSC before the current HPC archive system can access the data sored to the older system by HPC archive path:
For example, user kkauttaj could retrieve files, stored to the old systrem, with commands:
[kkayttaj@taito-login3 ~]$ ils /hpc_archive/old_archive/kkayttaj /hpc_archive/old_archive/kkayttaj: [kkayttaj@taito-login3 ~]$ iget /hpc_archive/old_archive/kkayttaj/old_project.tgz
IDA service was completely reconstructed in 2018 and the previous, iRODS based IDA is no longer in use. The first version of the new command line interfeace of IDA, ida, is available on Taito. At the moment the IDA client enables data upload and download, between Taito and IDA. You can also delete and move files inside IDA. File listing is not yet available. You should also note that the client operates only in the staging area of IDA and not with the frozen data.Before you start using IDA client in Taito you must set up your IDA connection by running command:
The configuration process asks for your CSC project number and application password. This information can be obtained from the personal information page of the IDA WWW-interface.
Once you have conifgured the connection, you can upload and download files between Taito and IDA with commands:
ida upload target_in_ida local_file ida download target_in_ida local_file
More information about using and configuring IDA client can be found from https://github.com/CSCfi/ida2-command-line-tools
CSC's archiving system, HPC archive, can be accessed via the Scientist's User Interface web portal by using portal's My Files tool (https://sui.csc.fi/group/sui/my-files). Intructions for data management with My Files, please see chapter 5.1.
|Previous chapter||One level up||Next chapter|