4.5 Debugging Parallel Applications

4.5.1 Abnormal Termination Processing (ATP)

Cray offers a mechanism to generate a stack trace and core files from parallel jobs that hit a system trap, such as segmentation fault. See

man atp

for details. To enable the feature, set

export ATP_ENABLED=1

in the batch job script before aprun command. Caveats:

  • the program needs to exit abnormally, hit system trap, MPI_Abort(), or such.

  • ATP only works for regular batch jobs. It does not work for interactive parallel jobs, i.e. jobs that have been started directly from login node command line with aprun.

If these conditions are met, enabling ATP is the first thing to try in debugging.

4.5.2 TotalView debugger

TotalView is a debugger with graphical user interface (GUI) for debugging parallel applications. With TotalView you can:

  • run an application under TotalView control
  • attach to a running application
  • examine a core file

Compile the application to be debugged, for example Fortran, c or C++ program. The compiler option -g is generating the debug information.

ftn -g -o mpi_prog mpi_prog.f95
cc -openmp -g -o hybrid_intel_compiled hybrid_mpi_openmp.c
CC -g -o myprog mycode.C


Sisu has compute nodes that either are included in the SLURM partitions or interactive nodes (which are not included in the SLURM partitions). Currently there are eight interactive nodes and jobs can be launched on them without submitting a batch job. So launching a debugging session can be done on interactive nodes or by submitting a batch job.

Debugging on interactive nodes

First example will launch a basic statically linked mpi code debugging session. Second example will launch a Intel compiled and statically linked hybrid mpi-openmp debugging session. If a code needs some runtime environment variables please define those using aprun option -e (see the example, option -e will set an environment variable on the interactive compute nodes). The environment variables that are defined on login nodes are not visible to the interactive jobs. That basically means that do not run or debug dynamically linked codes on interactive nodes unless the code has been compiled using the default module environment.

totalview aprun -a -n 16 ./mpi_prog
totalview aprun -a -e OMP_NUM_THREADS=8 -e KMP_AFFINITY="compact,1" -n 4 -d 8 -S 1 -ss -N 2 -cc numa_node ./hybrid_intel_compiled


Debugging on compute nodes (SLURM partitions)

This method will need a special setup but it will work also for dynamically linked codes. First step, write a setup file called debug_environment.sh. It has to be in the same directory where the Totalview is launched. Here is an example script that changes the programming environment and sets the number of threads for a OpenMP program. Please note: On Cray systems Totalview does not support debugging OpenMP sections of code.

# debug_environment.sh example file
# Switch to Intel environment>
module switch PrgEnv-cray PrgEnv-intel
# Execute the program with eight threads per task
export OMP_NUM_THREADS=8 export
export KMP_AFFINITY="compact,1"


An example for launching Totalview session using SLURM small partition for a dynamically linked Intel environment hybrid program is as follows. The SLURM salloc command can not be used directly. CSC has made an wrapper command tvsalloc for launching a Totalview debugging session. So after tvsalloc give the needed SLURM options (below:  run time of the job allocation ( -t 02:00:00 ), number of nodes (-N 4), SLURM partition ( -p small ). After aprun -a give the aprun options that your job will need (below: total numebr of processes (-n 8), number of threads per process (-d 8), number of process per NUMA node (-S 1), allocate memory only from local NUMA node (-ss), number of processes per compute node (-N 2), process and OpenMP threads are constrained to the local NUMA node (-cc numa_node).

tvsalloc -t 00:25:00 -p test -N 2 totalview aprun -a -n 32 ./mpi_prog
tvsalloc -t 02:00:00 N 4 -p small totalview aprun -a -n 8 -d 8 -S 1 -ss -N 2 -cc numa_node ./hybrid_intel_compiled


REMARK: Statically linked codes do not need the setup file, debug_environment.sh, unless your job uses runtime environment variables. But tvsalloc is necessary on SLURM partitions (test, small, large). If your debugging session will last less than half an hour it is good idea to submit the job to the SLURM test partition.

Debugging session

When a debugging session starts Totalview Startup Parameters window may appear. Just click Ok button (in a basic case). TotalView Root and Process window appear. Click the GO  button in the Totalview process window. A pop-up window appears, asking if you want to stop the job

Process srun is a parallel job.
Do you want to stop the job now


Select Yes in this pop-up window.

Very basic features of Totalview

The Process Window contains the code for the process or thread that you're debugging. This window is divided into panes of information. The Stack Trace Pane shows the call stack of routines. The Stack Frame Pane displays all of a routine's parameters, its local variables, and the registers for the selected stack frame.

The left margin of the Source Pane displays line numbers. An ARROW over the line number shows the current location of the program counter (PC) in the selected stack frame. One can place a breakpoint (left mouse click) at any line whose line number is contained within a box. After setting one or many breakpoints Go button executes your code to the next breakpoint. When one is placing a breakpoint on a line, TotalView places an icon over the line number. To remove a breakpoint just click the breakpoint icon one more time.

To examine or change a value of a variable right click the variable and select Dive from the pop up menu. To see the values of that variable on all processes select Across Processes from the pop up menu. A new window will show the values and other information from that variable. On that window one can edit the variable values.

Stepping commands Go, Next, Step, Out and Run to (on the top of Process window) are controlling the way one is executing the code. Go means go to the next breakpoint, if a breakpoint locates inside a loop the next breakpoint is the same until loop ends. Next executes the current line and the program counter (arrow) goes to next line. Step executes one line in your program and if the source line or instruction contains a subroutine or function call, TotalView steps into it. Out executes all statements within subroutine or function and exits. Run To executes all lines until the program counter reaches the selected line. A line is selected by clicking a code line (not the line number) and the background of that line turn grey.

Attach to a running application

Unfortunately this is not properly supported on Sisu.csc.fi environment.

Debugging a core file

Launch Totalview. Click a core file and on the Core File Session window enter the core file name and the program name. By default core files are not generated. To enable core files add

ulimit -c unlimited


in your batch job file.

4.5.3 LGDB debugger

lgdb is a GDB-based parallel debugger used to debug applications compiled with Cray Compiler Environment, GNU, and Intel Fortran, C and C++ compilers. It allows programmers to either launch an application or attach to an already running application that was launched with aprun. Additionally, it provides comparative debugging technology that enables programmers to compare data structures between two executing applications. Comparative debugging should be used in conjunction with the CCDB GUI tool accessed by loading the cray-ccdb module.

Previous chapter     One level up     Next chapter