3.7 Using DataWarp burst-I/O buffering

There is 36 TB of fast flash-based temporary storage, called DataWarp, directly attached to the compute nodes via the Aries interconnect. The storage is accessible from all compute nodes. DataWarp is aimed to improve the performance of applications that feature an I/O pattern that is non-optimal for Lustre parallel file system - that is, a great number of small but frequent reads or writes, by absorbing the I/O first to the DataWarp buffers and staging the files to the Lustre file system after the completion of the job. DataWarp technology will continue to evolve towards a true disk cache, but already in this form it is seen to significantly improve application I/O performance.

The simplest use case for the nodes is to direct all scratch I/O used by the application onto the DataWarp nodes. Assume the application is configured such that it writes its temporary files to "$SCRATCH" (change the location in the example below to match the setup). The DataWarps are enabled by adding the following lines to your batch job script (before the application invocation, after the Slurm parameters):

#DW jobdw type=scratch access_mode=striped capacity=2TiB
export SCRATCH=$DW_JOB_STRIPED

And then executing your application as usual.

The second use case is to stage permanent files from Lustre to the DataWarp storage at the beginning of the run, and from DataWarp to Lustre after the job has completed. This is achieved by adding the following lines to the job script, changing the source and destination parameters to point to the correct folders containing the files needed by, or produced by the job.

#DW jobdw type=scratch access_mode=striped capacity=2TiB
#DW stage_in type=directory source=path destination=$DW_JOB_STRIPED/path
#DW stage_out type=directory destination=path source=$DW_JOB_STRIPED/path

Increase the parameter "capacity" if more space is being needed. Here, "path" should point to the full path of your data, e.g. /wrk/(username)/dir, without using environment variables (like $WRKDIR).

You can check the status of the nodes (e.g. available space) with

scontrol show burst

For further information, see

    Previous chapter     One level up     Next chapter