Parallel Computing at CSC
Modern supercomputers consist of tightly connected PCs (computing nodes), as a rough simplification. In order to benefit from a supercomputer and to surpass the performance of a regular PC, the programs need to utilize the resources (CPU, memory, cache memory, and I/O) of multiple nodes in parallel. In fact, even to get the most out of a single modern multi-core processor, a single program needs to execute as multiple processes or threads in parallel.
CSC's experts help the users in choosing the best parallelization method. Please contact CSC's Service Desk, email@example.com, if you have any questions.
MPI and OpenMP
One categorization of parallel programing paradigms is to divide them into distributed memory approaches and shared memory approaches. The most widely used implementations of distributed and shared memory paradigms are Message Passing Interface (MPI) libraries and OpenMP compiler directives, respectively.
MPI is the most used communication library in massively parallel supercomputers, utilizing multiple computing nodes. In addition to the communication subroutine library itself, the implementations include the necessary system tools to compile and execute MPI programs.
In MPI programming, the tasks communicate by explicitly exchanging messages. This requires subroutine calls. In principle, there is no difference between tasks running within a single node or tasks running in different nodes.
MPI is standardized and thus portable. MPI programming requires some effort due to explicit communication model and relatively complex subroutine call syntax, but a well written MPI code typically performs well on most architectures.
OpenMP is most used to utilize the multiple cores of a single processor or multiple processors within a single computing node (or PC, laptop, etc.). OpenMP is implemented as compiler directives (or pragmas) and threads, and as such, after compiling, does not require any additional tools in the system.
In OpenMP programming, it is assumed that the individual threads can all "see" the same memory areas, and separate communication using messages between the threads is not needed.
Also OpenMP is standardized and portable. Adding directives or pragmas to a serial code is easy and the code can be parallelized step by step. However, when multiple threads access the same shared memory location, synchronization must be handled explicitly to avoid race conditions. Performance is limited by the number of available threads (typically threads within one node) and the possible serial sections in the code.
Hybrid MPI/OpenMP model is also possible, in which a single MPI task contains multiple threads, for example. A typical example is running MPI between nodes and OpenMP within nodes. In some cases this improves performance by reducing the congestion in communication resources.