OpenMP – FASRC DOCS https://docs.rc.fas.harvard.edu Thu, 27 Feb 2025 15:40:08 +0000 en-US hourly 1 https://wordpress.org/?v=6.8.1 https://docs.rc.fas.harvard.edu/wp-content/uploads/2018/08/fasrc_64x64.png OpenMP – FASRC DOCS https://docs.rc.fas.harvard.edu 32 32 172380571 MPI (Message Passing Interface) & OpenMPI https://docs.rc.fas.harvard.edu/kb/mpi-message-passing-interface/ Fri, 31 Jan 2025 18:49:19 +0000 https://docs.rc.fas.harvard.edu/?post_type=epkb_post_type_1&p=28106

Introduction

The Message Passing Interface (MPI) library allows processes in your parallel application to communicate with one another by sending and receiving messages. There is no default MPI library in your environment when you log in to the cluster. You need to choose the desired MPI implementation for your applications. This is done by loading an appropriate MPI module. Currently the available MPI implementations on our cluster are OpenMPI and Mpich. For both implementations the MPI libraries are compiled and built with either the Intel compiler suite or the GNU compiler suite. These are organized in software modules.

Installation

MPI has many forms, we’ll list a few here, and also look at User_Codes/Parallel_Computing/MPI.

mpi4py with Python

To use mpi4py you need to load an appropriate Python software module. We have the Anaconda Python distribution from Continuum Analytics. In addition to mpi4py, it includes hundreds of the most popular packages for large-scale data processing and scientific computing.

You can load python in your user environment by running in your terminal:

module load python/2.7.14-fasrc01

For example code, see Parallel_Computing/Python/mpi4py

OpenMPI with GNU Compiler

If you want to use OpenMPI compiled with the GNU compiler you need to load appropriate compiler and MPI modules. Below are some possible combinations, check module spider MODULENAME to get a full listing of possibilities.

# GCC + OpenMPI, e.g.,
module load gcc/13.2.0-fasrc01 openmpi/5.0.2-fasrc01

# GCC + Mpich, e.g.,
module load gcc/13.2.0-fasrc01 mpich/4.2.0-fasrc01

# Intel + OpenMPI, e.g.,
module load intel/24.0.1-fasrc01 openmpi/5.0.2-fasrc01

# Intel + Mpich, e.g.,
module load intel/24.0.1-fasrc01 mpich/4.2.0-fasrc01

# Intel + IntelMPI (IntelMPI runs mpich underneath), e.g.
module load intel/24.0.1-fasrc01 intelmpi/2021.11-fasrc01

For reproducibility and consistency it is recommended to use the complete module name with the module load command, as illustrated above. Modules on the cluster get updated often so check if there are more recent ones. The modules are set up so that you can only have one MPI module loaded at a time. If you try loading a second one it will automatically unload the first. This is done to avoid dependencies collisions.

There are four ways you can set up your MPI on the cluster:

  • Put the module load command in your startup files.
    Most users will find this option most convenient. You will likely only want to use a single version of MPI for all your work. This method also works with all MPI modules currently available on the cluster.

  • Load the module in your current shell.
    For the current MPI versions you do not need to have the module load command in your startup files. If you submit a job the remote processes will inherit the submission shell environment and use the proper MPI library. Note this method does not work with older versions of MPI.

  • Load the module in your job script.
    If you will be using different versions of MPI for different jobs, then you can put the module load command in your script. You need to ensure your script can execute the module load command properly.

  • Do not use modules and set environment variables yourself.
    You obviously do not need to use modules but can hard code paths. However, these locations may change without warning so you should set them in one location only and not scatter them throughout your scripts. This option could be useful if you have a customized local build of MPI you would like to use with your applications.

Video Training

Examples

For associated MPI examples, head over to  User_Codes/Parallel_Computing/MPI.

  • Example 1: Monte-Carlo calculation of π
  • Example 2: Integration of x2 in interval [0, 4] with 80 integration points and the trapezoidal rule
  • Example 3: Parallel Lanczos diagonalization with reorthogonalization and MPI I/O

Resources

]]>
28106
Hybrid (MPI+OpenMP) Codes on the FASRC cluster https://docs.rc.fas.harvard.edu/kb/hybrid-mpiopenmp-codes-on-odyssey/ Thu, 22 Oct 2015 20:24:56 +0000 https://rc.fas.harvard.edu/?page_id=14121  

Introduction

This page will help you compile and run hybrid (MPI+OpenMP) applications on the cluster. Currently we have both OpenMPI and Mvapich2 MPI libraries available, compiled with both Intel and GNU compiler suits.

Example Code

Below are simple hybrid example codes in Fortran 90 and C++.
Fortran 90:


!=====================================================
! Program: hybrid_test.f90 (MPI + OpenMP)
!          FORTRAN 90 example - program prints out
!          rank of each MPI process and OMP thread ID
!=====================================================
program hybrid_test
  implicit none
  include "mpif.h"
  integer(4) :: ierr
  integer(4) :: iproc
  integer(4) :: nproc
  integer(4) :: icomm
  integer(4) :: i
  integer(4) :: j
  integer(4) :: nthreads
  integer(4) :: tid
  integer(4) :: omp_get_num_threads
  integer(4) :: omp_get_thread_num
  call MPI_INIT(ierr)
  icomm = MPI_COMM_WORLD
  call MPI_COMM_SIZE(icomm,nproc,ierr)
  call MPI_COMM_RANK(icomm,iproc,ierr)
!$omp parallel private( tid )
  tid = omp_get_thread_num()
  nthreads = omp_get_num_threads()
  do i = 0, nproc-1
     call MPI_BARRIER(icomm,ierr)
     do j = 0, nthreads-1
        !$omp barrier
        if ( iproc == i .and. tid == j ) then
           write (6,*) "MPI rank:", iproc, " with thread ID:", tid
        end if
     end do
  end do
!$omp end parallel
  call MPI_FINALIZE(ierr)
  stop
end program hybrid_test

C++:


//==========================================================
//  Program: hybrid_test.cpp (MPI + OpenMP)
//           C++ example - program prints out
//           rank of each MPI process and OMP thread ID
//==========================================================
#include <iostream>
#include <mpi.h>
#include <omp.h>
using namespace std;
int main(int argc, char** argv){
  int iproc;
  int nproc;
  int i;
  int j;
  int nthreads;
  int tid;
  int provided;
  MPI_Init_thread(&argc,&argv, MPI_THREAD_MULTIPLE, &provided);
  MPI_Comm_rank(MPI_COMM_WORLD,&iproc);
  MPI_Comm_size(MPI_COMM_WORLD,&nproc);
#pragma omp parallel private( tid )
  {
    tid = omp_get_thread_num();
    nthreads = omp_get_num_threads();
    for ( i = 0; i <= nproc - 1; i++ ){
      MPI_Barrier(MPI_COMM_WORLD);
      for ( j = 0; j <= nthreads - 1; j++ ){
        if ( (i == iproc) && (j == tid) ){
          cout << "MPI rank: " << iproc << " with thread ID: " << tid << endl;
        }
      }
    }
  }
  MPI_Finalize();
  return 0;
}

Compiling the program

MPI Intel, Fortran 90: [username@rclogin02 ~]$ mpif90 -o hybrid_test.x hybrid_test.f90 -openmp
MPI Intel, C++:        [username@rclogin02 ~]$ mpicxx -o hybrid_test.x hybrid_test.cpp -openmp
MPI GNU, Fortran 90:   [username@rclogin02 ~]$ mpif90 -o hybrid_test.x hybrid_test.f90 -fopenmp
MPI GNU, C++:          [username@rclogin02 ~]$ mpicxx -o hybrid_test.x hybrid_test.cpp -fopenmp

Running the program

You could use the following SLURM batch-job submission script to submit the job to the queue:

#!/bin/bash
#SBATCH -J hybrid_test
#SBATCH -o hybrid_test.out
#SBATCH -e hybrid_test.err
#SBATCH -p shared
#SBATCH -n 2
#SBATCH -c 4
#SBATCH -t 180
#SBATCH --mem-per-cpu=4G
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
srun -n $SLURM_NTASKS -c $SLURM_CPUS_PER_TASK --mpi=pmix ./hybrid_test.x

The OMP_NUM_THREADS environmental variable is used to set the number of threads to the desired number. Please notice that this job will use 2 MPI processes (set with the -n option) and 4 OpenMP threads per MPI process (set with the -c option), so overall the job will reserve and use 8 compute cores. If you name the above script omp_test.batch, for instance, the job is submitted to the queue with

sbatch omp_test.batch

Upon job completion, job output will be located in the file hybrid_test.out with the contents:

 MPI rank:           0  with thread ID:           0
 MPI rank:           0  with thread ID:           1
 MPI rank:           0  with thread ID:           2
 MPI rank:           0  with thread ID:           3
 MPI rank:           1  with thread ID:           0
 MPI rank:           1  with thread ID:           1
 MPI rank:           1  with thread ID:           2
 MPI rank:           1  with thread ID:           3
]]>
14121