Software – FASRC DOCS https://docs.rc.fas.harvard.edu Thu, 13 Nov 2025 17:29:19 +0000 en-US hourly 1 https://wordpress.org/?v=6.8.3 https://docs.rc.fas.harvard.edu/wp-content/uploads/2018/08/fasrc_64x64.png Software – FASRC DOCS https://docs.rc.fas.harvard.edu 32 32 172380571 Weather Research & Forecasting Model (WRF) https://docs.rc.fas.harvard.edu/kb/weather-research-forecasting-model-wrf/ Thu, 13 Nov 2025 17:29:13 +0000 https://docs.rc.fas.harvard.edu/?post_type=epkb_post_type_1&p=29215 Description

The Weather Research and Forecasting (WRF) model is a widely used, state-of-the-art atmospheric simulation system designed for both meteorological research and operational forecasting.

On the FASRC cluster, WRF is intended to be run in high performance computing (HPC) mode, leveraging multiple compute nodes and large-scale parallelism for efficient, large-domain simulations.

Usage

For usage information on WRF, see SPACK package manager: WRF

Examples

We have some examples for WRF Module in User Codes.

]]>
29215
SPACK Package Manager https://docs.rc.fas.harvard.edu/kb/spack-package-manager/ Thu, 06 Nov 2025 20:43:22 +0000 https://docs.rc.fas.harvard.edu/?post_type=epkb_post_type_1&p=29198

Introduction to Spack

Spack is a package management tool designed to support multiple versions and configurations of software on a wide variety of platforms and environments. It was designed for large supercomputer centers, where many users and application teams share common installations of software on clusters with exotic architectures, using non-standard libraries.

Spack is non-destructive: installing a new version does not break existing installations. In this way several configurations can coexist on the same system.

Most importantly, Spack is simple. It offers a simple spec syntax so that users can specify versions and configuration options concisely. Spack is also simple for package authors: package files are written in pure Python, and specs allow package authors to maintain a single file for many different builds of the same package.

Note: These instructions are intended to guide you on how to use Spack on the FAS RC Cannon cluster.

Installation and Setup

Install and Setup

Spack works out of the box. Simply clone Spack to get going. In this example, we will clone the latest version of Spack.

Note: Spack can be installed in your home or lab space. For best performance and efficiency, we recommend to install Spack in your lab directory, e.g., /n/holylabs/<PI_LAB>/Lab/software or other lab storage if holylabs is not available.

$ git clone -c feature.manyFiles=true https://github.com/spack/spack.git
Cloning into 'spack'...
remote: Enumerating objects: 686304, done.
remote: Counting objects: 100% (1134/1134), done.
remote: Compressing objects: 100% (560/560), done.
remote: Total 686304 (delta 913), reused 573 (delta 569), pack-reused 685170 (from 5)
Receiving objects: 100% (686304/686304), 231.28 MiB | 43.53 MiB/s, done.
Resolving deltas: 100% (325977/325977), done.
Updating files: 100% (1709/1709), done.

This will create the spack folder in the current directory. Next, go to this directory and add Spack to your path. Spack has some nice command-line integration tools, so instead of simply appending to your PATH variable, source the Spack setup script.

$ cd spack/
$ source share/spack/setup-env.sh
$ spack --version
1.0.0.dev0 (3b00a98cc8e8c1db33453d564f508928090be5a0)

Your version is likely different because Spack updates their Github often.

Group Permissions

By default, Spack will match your usual file permissions which typically are set up without group write permission. For lab wide installs of Spack though, you will want to ensure that it has group write enforced. You can set this by going to the etc/spack directory in your Spack installation and adding a file called packages.yaml (or editing the exiting one) with the following contents. Example for the jharvard_lab (substitute jharvard_lab with your own lab):

packages:
  all:
    permissions:
      write: group
      group: jharvard_lab

Default Architecture

By default, Spack will autodetect the architecture of your underlying hardware and build software to match it. However, in cases where you are running on heterogeneous hardware, it is best to use a more generic flag. You can set this by editing the file etc/spack/packages.yaml located inside the spack folder (if you don’t have the file etc/spack/packages.yaml, you can create it). Add the following contents:

packages:
  all:
    target: [x86_64]

Relocating Spack

Once your Spack environment has been installed it cannot be easily moved. Some of the packages in Spack hardcode the absolute paths into themselves and thus cannot be changed with out rebuilding them. As such simply copying the Spack installation will not actually move the Spack installation.

The easiest way to move a space install if you need to keep the exact same stack of software is to first create a spack environment with all the software you need. Once you have that you can export the environment similar to how you would for conda environments. After that you can then use that environment file export to rebuild in the new location.

Available Spack Packages

A complete list of all available Spack packages can be found also here. The spack list displays the available packages, e.g.,

$ spack list
==> 6752 packages
<omitted output>

NOTE: You can also look for available spack packages at https://packages.spack.io

The spack list command can also take a query string. Spack automatically adds wildcards to both ends of the string, or you can add your own wildcards. For example, we can view all available Python packages.

# with wildcard at both ends of the strings
$ spack list py
==> 1979 packages
<omitted outout>

# add your own wilcard: here, list packages that start with py
$ spack list 'py-*'
==> 1960 packages.
<omitted output>

You can also look for specific packages, e.g.,

$ spack list lammps
==> 1 packages.
lammps

You can display available software versions, e.g.,

$ spack versions lammps
==> Safe versions (already checksummed):
  master    20211214  20210929.2  20210929  20210831  20210728  20210514  20210310  20200721  20200505  20200227  20200204  20200109  20191030  20190807  20181212  20181127  20181109  20181010  20180905  20180822  20180316  20170922
  20220107  20211027  20210929.1  20210920  20210730  20210702  20210408  20201029  20200630  20200303  20200218  20200124  20191120  20190919  20190605  20181207  20181115  20181024  20180918  20180831  20180629  20180222  20170901
==> Remote versions (not yet checksummed):
  1Sep2017

Note: for the spack versions command, the package name needs to match exactly. For example, spack versions lamm will not be found:

$ spack versions lamm
==> Error: Package 'lamm' not found.
You may need to run 'spack clean -m'.

Installing Packages

Installing packages with Spack is very straightforward. To install a package simply type spack install PACKAGE_NAME. Large packages with multiple dependencies can take significant time to install, thus we recommend doing this in a screen/tmux session or a Open Ondemand Remote Desktop session.

To install the latest version of a package, type:

$ spack install bzip2

To install a specific version (1.0.8) of bzip2, add @ and the version number you need:

$ spack install bzip2@1.0.8

Here we installed a specific version (1.0.8) of bzip2. The installed packages can be displayed by the command spack find:

$ spack find
-- linux-rocky8-icelake / gcc@8.5.0 -----------------------------
bzip2@1.0.8  diffutils@3.8  libiconv@1.16
==> 3 installed packages

One can also request that Spack uses a specific compiler flavor / version to install packages, e.g.,

$ spack install zlib@1.2.13%gcc@8.5.0

To specify the desired compiler, one uses the % sigil.

The @ sigil is used to specify versions, both of packages and of compilers, e.g.,

$ spack install zlib@1.2.8
$ spack install zlib@1.2.8%gcc@8.5.0

Finding External Packages

Spack will normally built its own package stack, even if there are libaries available as part of the operating system. If you want Spack to build against system libraries instead of building its own you will need to have it discover what libraries available natively on the system. You can do this using the spack external find.

$ spack external find
==> The following specs have been detected on this system and added to /n/home/jharvard/.spack/packages.yaml
autoconf@2.69    binutils@2.30.117  curl@7.61.1    findutils@4.6.0  git@2.31.1   groff@1.22.3   m4@1.4.18      openssl@1.1.1k  tar@1.30
automake@1.16.1  coreutils@8.30     diffutils@3.6  gawk@4.2.1       gmake@4.2.1  libtool@2.4.6  openssh@8.0p1  pkgconf@1.4.2   texinfo@6.5

This even works with modules loaded from other package managers. You simply have to have those loaded prior to running the find command. After these have been added to Spack, Spack will try to use them if it can in future builds rather than installing its own versions.

Using an Lmod module in Spack

Use your favorite text editor, e.g., Vim, Emacs,VSCode, etc., to edit the package configuration YAML file ~/.spack/packages.yaml, e.g.,

vi ~/.spack/packages.yaml

Each package section in this file is similar to the below:

packages:
  package1:
    # settings for package1
  package2:
    # settings for package2
  fftw:
    externals:
    - spec: fftw@3.3.10
      prefix: /n/sw/helmod-rocky8/apps/MPI/gcc/14.2.0-fasrc01/openmpi/5.0.5-fasrc01/fftw/3.3.10-fasrc01
    buildable: false

To obtain the prefix of a module that will be used in Spack, find the module’s <MODULENAME>_HOME.

Let’s say you would like to use fftw/3.3.10-fasrc01 module instead of building it with Spack. You can find its <MODULENAME>_HOME with:

$ echo $FFTW_HOME
/n/sw/helmod-rocky8/apps/MPI/gcc/14.2.0-fasrc01/openmpi/5.0.5-fasrc01/fftw/3.3.10-fasrc01

Alernatively, you can find <MODULENAME>_HOME with

$ module display fftw/3.3.10-fasrc01

Uninstalling Packages

Spack provides an easy way to uninstall packages with the spack uninstall PACKAGE_NAME, e.g.,

$ spack uninstall zlib@1.2.13%gcc@8.5.0
==> The following packages will be uninstalled:

    -- linux-rocky8-icelake / gcc@8.5.0 -----------------------------
    xlt7jpk zlib@1.2.13

==> Do you want to proceed? [y/N] y
==> Successfully uninstalled zlib@1.2.13%gcc@8.5.0+optimize+pic+shared build_system=makefile arch=linux-rocky8-icelake/xlt7jpk

Note: The recommended way of uninstalling packages is by specifying the full package name, including the package version and compiler flavor and version used to install the package on the first place.

Using Installed Packages

There are several different ways to use Spack packages once you have installed them. The easiest way is to use spack load PACKAGE_NAME to load and spack unload PACKAGE_NAME to unload packages, e.g.,

$ spack load bzip2
$ which bzip2
/home/spack/opt/spack/linux-rocky8-icelake/gcc-8.5.0/bzip2-1.0.8-aohgpu7zn62kzpanpohuevbkufypbnff/bin/bzip2

The loaded packages can be listed with spack find --loaded, e.g.,

$ spack find --loaded
-- linux-rocky8-icelake / gcc@8.5.0 -----------------------------
bzip2@1.0.8  diffutils@3.8  libiconv@1.16
==> 3 loaded packages

If you no longer need the loaded packages, you can unload them with:

$ spack unload
$ spack find --loaded
==> 0 loaded packages

Configuration

Compiler Configuration

On the cluster, we support a set of core compilers, such as GNU (GCC) compiler suit, Intel, and PGI provided on the cluster through software modules.

Spack has the ability to build packages with multiple compilers and compiler versions. This can be particularly useful, if a package needs to be built with specific compilers and compiler versions. You can display the available compilers by the spack compiler list command.

If you have never used Spack, you will likely have no compiler listed (see Add GCC compiler section below for how to add compilers):

$ spack compiler list
==> No compilers available. Run `spack compiler find` to autodetect compilers

If you have used Spack before, you may see system-level compilers provided by the operating system (OS) itself:

$ spack compiler list
==> Available compilers
-- gcc rocky8-x86_64 --------------------------------------------
[e]  gcc@8.5.0

-- llvm rocky8-x86_64 -------------------------------------------
[e]  llvm@19.1.7

You can easily add additional compilers to spack by loading the appropriate software modules, running the spack compiler find command, and edit the compilers.yaml configuration file. For instance, if you need GCC version 12.2.0 you need to do the following:

Load the required software module

$ module load gcc/14.2.0-fasrc01
$ which gcc
/n/sw/helmod-rocky8/apps/Core/gcc/14.2.0-fasrc01/bin/gcc

Add GCC compiler version to the spack compilers

$ spack compiler find
==> Added 1 new compiler to /n/home01/jharvard/.spack/packages.yaml
    gcc@14.2.0
==> Compilers are defined in the following files:
    /n/home01/jharvard/.spack/packages.yaml

If you run spack compiler list again, you will see that the new compiler has been added to the compiler list and made a default (listed first), e.g.,

$ spack compiler list
==> Available compilers
-- gcc rocky8-x86_64 --------------------------------------------
[e]  gcc@8.5.0  [e]  gcc@14.2.0

-- llvm rocky8-x86_64 -------------------------------------------
[e]  llvm@19.1.7

Note: By default, spack does not fill in the modules: field in the ~/.spack/packages.yaml file. If you are using a compiler from a module, then you should add this field manually.

Edit manually the compiler configuration file

Use your favorite text editor, e.g., Vim, Emacs,VSCode, etc., to edit the compiler configuration YAML file ~/.spack/packages.yaml, e.g.,

vi ~/.spack/packages.yaml

Each compiler is defined as a package in ~/.spack/packages.yaml. Below, you can see gcc 14.2.0 (from module) and gcc 8.5.0 (from OS) defined:

packages:
  gcc:
    externals:
    - spec: gcc@14.2.0 languages:='c,c++,fortran'
      prefix: /n/sw/helmod-rocky8/apps/Core/gcc/14.2.0-fasrc01
      extra_attributes:
        compilers:
          c: /n/sw/helmod-rocky8/apps/Core/gcc/14.2.0-fasrc01/bin/gcc
          cxx: /n/sw/helmod-rocky8/apps/Core/gcc/14.2.0-fasrc01/bin/g++
          fortran: /n/sw/helmod-rocky8/apps/Core/gcc/14.2.0-fasrc01/bin/gfortran
    - spec: gcc@8.5.0 languages:='c,c++,fortran'
      prefix: /usr
      extra_attributes:
        compilers:
          c: /usr/bin/gcc
          cxx: /usr/bin/g++
          fortran: /usr/bin/gfortran

We have to add the modules: definition for gcc 14.2.0:

packages:
  gcc:
    externals:
    - spec: gcc@14.2.0 languages:='c,c++,fortran'
      prefix: /n/sw/helmod-rocky8/apps/Core/gcc/14.2.0-fasrc01
      extra_attributes:
        compilers:
          c: /n/sw/helmod-rocky8/apps/Core/gcc/14.2.0-fasrc01/bin/gcc
          cxx: /n/sw/helmod-rocky8/apps/Core/gcc/14.2.0-fasrc01/bin/g++
          fortran: /n/sw/helmod-rocky8/apps/Core/gcc/14.2.0-fasrc01/bin/gfortran
      modules: [gcc/14.2.0-fasrc01]

and save the packages file. If more than one modules are required by the compiler, these need to be separated by semicolon ;.

We can display the configuration of a specific compiler by the spack compiler info command, e.g.,

$ spack compiler info gcc@14.2.0
gcc@=14.2.0 languages:='c,c++,fortran' arch=linux-rocky8-x86_64:
  prefix: /n/sw/helmod-rocky8/apps/Core/gcc/14.2.0-fasrc01
  compilers:
    c: /n/sw/helmod-rocky8/apps/Core/gcc/14.2.0-fasrc01/bin/gcc
    cxx: /n/sw/helmod-rocky8/apps/Core/gcc/14.2.0-fasrc01/bin/g++
    fortran: /n/sw/helmod-rocky8/apps/Core/gcc/14.2.0-fasrc01/bin/gfortran
  modules:
    gcc/14.2.0-fasrc01

Once the new compiler is configured, it can be used to build packages. The below example shows how to install the GNU Scientific Library (GSL) with gcc@14.2.0.

# Check available GSL versions
$ spack versions gsl
==> Safe versions (already checksummed):
  2.8  2.7.1  2.7  2.6  2.5  2.4  2.3  2.2.1  2.1  2.0  1.16
==> Remote versions (not yet checksummed):
  2.2  1.15  1.14  1.13  1.12  1.11  1.10  1.9  1.8  1.7  1.6  1.5  1.4  1.3  1.2  1.1.1  1.1  1.0

# Install GSL version 2.8 with GCC version 14.2.0
$ spack install gsl@2.8%gcc@14.2.0

# Load the installed package
$ spack load gsl@2.8%gcc@14.2.0

# List the loaded package
$ spack find --loaded
-- linux-rocky8-x86_64 / gcc@14.2.0 -----------------------------
gsl@2.8
==> 1 loaded package

MPI Configuration

Many HPC software packages work in parallel using MPI. Although Spack has the ability to install MPI libraries from scratch, the recommended way is to configure Spack to use MPI already available on the cluster as software modules, instead of building its own MPI libraries.

MPI is configured through the packages.yaml file. For instance, if we need OpenMPI version 5.0.5 compiled with GCC version 14, we could follow the below steps to add this MPI configuration:

Determine the MPI location / prefix

$ module load gcc/14.2.0-fasrc01 openmpi/5.0.5-fasrc01
$ echo $MPI_HOME
/n/sw/helmod-rocky8/apps/Comp/gcc/14.2.0-fasrc01/openmpi/5.0.5-fasrc01

Edit manually the packages configuration file

Use your favorite text editor, e.g., Vim, Emacs,VSCode, etc., to edit the packages configuration YAML file ~/.spack/packages.yaml, e.g.,

$ vi ~/.spack/packages.yaml

Note: If the file ~/.spack/packages.yaml does not exist, you will need to create it.

Include the following contents:

packages:
  openmpi:
    externals:
    - spec: openmpi@5.0.5%gcc@14.2.0
      prefix: /n/sw/helmod-rocky8/apps/Comp/gcc/14.2.0-fasrc01/openmpi/5.0.5-fasrc01
    buildable: false

The option buildable: False reassures that MPI won’t be built from source. Instead, Spack will use the MPI provided as a software module in the corresponding prefix.

Once the MPI is configured, it can be used to build packages. The below example shows how to install HDF5 version 1.12.2 with openmpi@5.0.5 and gcc@14.2.0.

Note: Please note the command module purge. This is required as otherwise the build fails.

$ module purge
$ spack install hdf5@1.14.6 % gcc@14.2.0 ^ openmpi@5.0.5

Intel MPI Configuration

Here we provide instructions on how to set up spack to build applications with Intel-MPI on the FASRC Cannon cluster. Intel MPI Library is now included in the Intel oneAPI HPC Toolkit.

Intel Compiler Configuration

The first step involves setting spack to use the Intel compiler, which is provided as a software module. This follows similar procedure to that of adding the GCC compiler.

Load the required software module
$ module load intel/23.0.0-fasrc01
$ which icc
/n/sw/intel-oneapi-2023/compiler/2023.0.0/linux/bin/intel64/icc
Add this Intel compiler version to the spack compilers
$ spack compiler add

If you run the command spack compilers, you will see that the following 3 compilers have been added:

$ spack compilers
...
-- dpcpp rocky8-x86_64 ------------------------------------------
dpcpp@2023.0.0

-- intel rocky8-x86_64 ------------------------------------------
intel@2021.8.0

-- oneapi rocky8-x86_64 -----------------------------------------
oneapi@2023.0.0
Edit manually the compiler configuration file

Use your favorite text editor, e.g., Vim, Emacs, VSCode, etc., to edit the compiler configuration YAML file ~/.spack/linux/compilers.yaml, e.g.,

$ vi ~/.spack/linux/compilers.yaml

Each -compiler: section in this file is similar to the below:

- compiler:
    spec: intel@2021.8.0
    paths:
      cc: /n/sw/intel-oneapi-2023/compiler/2023.0.0/linux/bin/intel64/icc
      cxx: /n/sw/intel-oneapi-2023/compiler/2023.0.0/linux/bin/intel64/icpc
      f77: /n/sw/intel-oneapi-2023/compiler/2023.0.0/linux/bin/intel64/ifort
      fc: /n/sw/intel-oneapi-2023/compiler/2023.0.0/linux/bin/intel64/ifort
    flags: {}
    operating_system: rocky8
    target: x86_64
    modules: []
    environment: {}
    extra_rpaths: []

Note: Here we focus specifically on the intel@2021.8.0 compiler as it is required by the Intel MPI Library.

We have to edit the modules: [] line to read

    modules: [intel/23.0.0-fasrc01]

and save the compilers.yaml file.

We can display the configuration of a specific compiler by the spack compiler info command, e.g.,

$ spack compiler info intel@2021.8.0
intel@2021.8.0:
        paths:
                cc = /n/sw/intel-oneapi-2023/compiler/2023.0.0/linux/bin/intel64/icc
                cxx = /n/sw/intel-oneapi-2023/compiler/2023.0.0/linux/bin/intel64/icpc
                f77 = /n/sw/intel-oneapi-2023/compiler/2023.0.0/linux/bin/intel64/ifort
                fc = /n/sw/intel-oneapi-2023/compiler/2023.0.0/linux/bin/intel64/ifort
        modules  = ['intel/23.0.0-fasrc01']
        operating system  = rocky8

Setting up the Intel MPI Library

Use your favorite text editor, e.g., Vim, Emacs, VSCode, etc., to edit the packages configuration YAML file ~/.spack/packages.yaml, e.g.,

$ vi ~/.spack/packages.yaml

Note: If the file ~/.spack/packages.yaml does not exist, you will need to create it.

Include the following contents:

packages:
  intel-oneapi-mpi:
    externals:
    - spec: intel-oneapi-mpi@2021.8.0%intel@2021.8.0
      prefix: /n/sw/intel-oneapi-2023
    buildable: false

Example

Once spack is configured to use Intel MPI, it can be used to build packages with it. The below example shows how to install HDF5 version 1.13.2 with intel@2021.8.0 and intel-oneapi-mpi@2021.8.0.

You can first test this using the spack spec command to show how the spec is concretized:

$ spack spec hdf5@1.13.2%intel@2021.8.0+mpi+fortran+cxx+hl+threadsafe ^ intel-oneapi-mpi@2021.8.0%intel@2021.8.0

Next, you can build it:

$ spack install hdf5@1.13.2%intel@2021.8.0+mpi+fortran+cxx+hl+threadsafe ^ intel-oneapi-mpi@2021.8.0%intel@2021.8.0

Spack Environments

Spack environments are a powerful feature of the Spack package manager that enable users to create isolated and reproducible environments for their software projects. Each Spack environment contains a specific set of packages and dependencies, which are installed in a self-contained directory tree. This means that different projects can have different versions of the same package, without interfering with each other. Spack environments also allow users to share their software environments with others, making it easier to collaborate on scientific projects.

Creating and activating environments

To create a new Spack environment:

$ spack env create myenv
$ spack env activate -p myenv

To deactivate an environment:

$ spack env deactivate

To list available environments:

$ spack env list

To remove an environment:

$ spack env remove myenv

For more detailed information about Spack environments, please refer to the Environments Tutorial.

Application Recipes

This section provides step-by-step instructions for installing and configuring specific scientific applications using Spack.

GROMACS with MPI

GROMACS is a free and open-source software suite for high-performance molecular dynamics and output analysis.

The below instructions provide a spack recipe for building MPI capable instance of GROMACS on the FASRC Cannon cluster.

Compiler and MPI Library Spack configuration

Here we will use the GNU/GCC compiler suite together with OpenMPI.

The below instructions assume that spack is already configured to use the GCC compiler gcc@12.2.0 and OpenMPI Library openmpi@4.1.5, as explained here.

Create GROMACS spack environment and activate it

spack env create gromacs
spack env activate -p gromacs

Install the GROMACS environment

Add the required packages to the spack environment
spack add openmpi@4.1.5
spack add gromacs@2023.3 + mpi + openmp % gcc@12.2.0 ^ openmpi@4.1.5
Install the environment

Once all required packages are added to the environment, it can be installed with:

spack install

Use GROMACS

Once the environment is installed, all installed packages in the GROMACS environment are available on the PATH, e.g.:

[gromacs] [pkrastev@builds01 Spack]$ gmx_mpi -h
                    :-) GROMACS - gmx_mpi, 2023.3-spack (-:

Executable:   /builds/pkrastev/Spack/spack/opt/spack/linux-rocky8-x86_64/gcc-12.2.0/gromacs-2023.3-42ku4gzzitbmzoy4zq43o3ozwr5el3tx/bin/gmx_mpi
Data prefix:  /builds/pkrastev/Spack/spack/opt/spack/linux-rocky8-x86_64/gcc-12.2.0/gromacs-2023.3-42ku4gzzitbmzoy4zq43o3ozwr5el3tx
Working dir:  /builds/pkrastev/Spack
Command line:
  gmx_mpi -h
Interactive runs

You can run GROMACS interactively. This assumes you have requested an interactive session first, as explained here.

In order to set up your GROMACS environment, you need to run the commands:

### Replace <PATH TO> with the actual path to your spack installation
. <PATH TO>/spack/share/spack/setup-env.sh
spack env activate gromacs
Batch jobs

When submitting batch-jobs, you will need to add the below lines to your submission script:

# --- Activate the GROMACS Spack environment., e.g., ---
### NOTE: Replace <PATH TO> with the actual path to your spack installation
. <PATH TO>/spack/share/spack/setup-env.sh
spack env activate gromacs

LAMMPS with MPI

LAMMPS is a classical molecular dynamics code with a focus on materials modeling. It’s an acronym for Large-scale Atomic/Molecular Massively Parallel Simulator.

The below instructions provide a spack recipe for building MPI capable instance of LAMMPS on the FASRC Cannon cluster.

Pre-requisites: Compiler and MPI Library Spack configuration

Here we will use the GNU/GCC compiler suite together with OpenMPI. We will also use a module for FFTW.

The below instructions assume that spack is already configured to use the GCC compiler gcc@14.2.0, OpenMPI Library openmpi@5.0.5, and FFTW with module fftw/3.3.10-fasrc01. If you have not configured them yet, see:

  1. To add gcc compiler: Spack compiler configuration
  2. To add openmpi: Spack MPI Configuration
  3. To add fftw as an external package: Using an Lmod module in Spack

Create LAMMPS spack environment and activate it

First, request an interactive job

salloc --partition test --time 06:00:00 --mem-per-cpu 4G -c 8

Second, download and activate Spack. For performance, we recommend using a Lab share in Holyoke (i.e., path starts with holy) instead of using your home directory. Here, we show an example with /n/holylabs:

cd /n/holylabs/jharvard_lab/Lab/jharvard
git clone -c feature.manyFiles=true https://github.com/spack/spack.git spack_lammps
cd spack_lammps/
source share/spack/setup-env.sh

Finally, create a Spack environment and activate it

spack env create lammps
spack env activate -p lammps

Install the LAMMPS environment

Note on architecture

If you are planning to run LAMMPS in different partitions, we recommend setting Spack to a general architecture. Otherwise, Spack will detect the architecture of the node that you are building LAMMPS and optimize for this specific architecture and may not run on another hardware. For example, LAMMPS built on Sapphire Rapids may not run on Cascade Lake.

Install libbsd

Note: In this recipe, we first install libbsd with the system version of the GCC compiler, gcc@8.5.0, as the installation fails, if we try to add it directly to the environment and install it with gcc@12.2.0.

spack install --add libbsd@0.12.2 % gcc@8.5.0
Add the rest of the required packages to the spack environment

First, add Python≤3.10 because newer versions of Python do not contain the package distutils (reference) and will cause the installation to fail.

spack add python@3.10

Second, add openmpi

spack add openmpi@5.0.5

Third, add FFTW

spack add fftw@3.3.10

Then, add LAMMPS required packages

spack add lammps +asphere +body +class2 +colloid +compress +coreshell +dipole +granular +kokkos +kspace +manybody +mc +misc +molecule +mpiio +openmp-package +peri +python +qeq +replica +rigid +shock +snap +spin +srd +user-reaxc +user-misc % gcc@14.2.0 ^ openmpi@5.0.5
Install the environment

Once all required packages are added to the environment, it can be installed with (note that the installation can take 1-2 hours):

spack install

Use LAMMPS

Once the environment is installed, all installed packages in the LAMMPS environment are available on the PATH, e.g.:

[lammps] [jharvard@holy8a24102 spack_lammps]$ lmp -h

Large-scale Atomic/Molecular Massively Parallel Simulator - 29 Oct 2020

Usage example: lmp -var t 300 -echo screen -in in.alloy

List of command line options supported by this LAMMPS executable:
Interactive runs

You can run LAMMPS interactively in both serial and parallel mode. This assumes you have requested an interactive session first, as explained here.

Prerequisite:

Source Spack and activate environment

### NOTE: Replace <PATH TO spack_lammps> with the actual path to your spack installation

[jharvard@holy8a24102 ~]$ cd <PATH TO spack_lammps>
[jharvard@holy8a24102 spack_lammps]$ source share/spack/setup-env.sh
[jharvard@holy8a24102 spack_lammps]$ spack env activate -p lammps

Serial

[lammps] [jharvard@holy8a24301 spack_lammps]$ lmp -in in.demo

Parallel (e.g., 4 MPI tasks)

[lammps] [jharvard@holy8a24301 spack_lammps]$ mpirun -np 4 lmp -in in.demo
Batch jobs

Example batch job submission script

Below is an example batch-job submission script run_lammps.sh using the LAMMPS spack environment.

#!/bin/bash
#SBATCH -J lammps_test        # job name
#SBATCH -o lammps_test.out    # standard output file
#SBATCH -e lammps_test.err    # standard error file
#SBATCH -p shared             # partition
#SBATCH -n 4                  # ntasks
#SBATCH -t 00:30:00           # time in HH:MM:SS
#SBATCH --mem-per-cpu=500     # memory in megabytes

# --- Activate the LAMMPS Spack environment., e.g., ---
### NOTE: Replace <PATH TO> with the actual path to your spack installation
. <PATH TO>/spack_lammps/share/spack/setup-env.sh
spack env activate lammps

# --- Run the executable ---
srun -n $SLURM_NTASKS --mpi=pmix lmp -in in.demo

Submit the job

sbatch run_lammps.sh

WRF (Weather Research and Forecasting)

The Weather Research and Forecasting (WRF) Model is a next-generation mesoscale numerical weather prediction system designed for both atmospheric research and operational forecasting applications. WRF features two dynamical cores, a data assimilation system, and a software architecture supporting parallel computation and system extensibility. The model serves a wide range of meteorological applications across scales from tens of meters to thousands of kilometers.

WRF official website: https://www.mmm.ucar.edu/weather-research-and-forecasting-model

Compiler and MPI Library Spack configuration

We use the Intel compiler suite together with the Intel MPI Library. The below instructions assume that spack is already configured to use the Intel compiler intel@2021.8.0 and Intel MPI Library intel-oneapi-mpi@2021.8.0.

Create WRF spack environment and activate it

spack env create wrf
spack env activate -p wrf

Add the required packages to the spack environment

In addition to WRF and WPS we also build ncview and ncl.

spack add intel-oneapi-mpi@2021.8.0
spack add hdf5@1.12%intel@2021.8.0 +cxx+fortran+hl+threadsafe
spack add libpng@1.6.37%intel@2021.8.0
spack add jasper@1.900.1%intel@2021.8.0
spack add netcdf-c@4.9.0%intel@2021.8.0
spack add netcdf-fortran@4.6.0%intel@2021.8.0
spack add xz@5.4.2%intel@2021.8.0
spack add wrf@4.4%intel@2021.8.0
spack add wps@4.3.1%intel@2021.8.0
spack add cairo@1.16.0%gcc@8.5.0
spack add ncview@2.1.8%intel@2021.8.0
spack add ncl@6.6.2%intel@2021.8.0

NOTE: Here we use the gcc@8.5.0 compiler to build cairo@1.16 as it fails to compile with the Intel compiler.

Install the WRF environment

Once all required packages are added to the environment, it can be installed with:

spack install

Use WRF/ WPS

Once the environment is installed, WRF and WPS (and any other packages from the environment, such as ncview), are available on the PATH, e.g.:

[wrf] [pkrastev@builds01 spack]$ which wrf.exe
/builds/pkrastev/Spack/spack/var/spack/environments/wrf/.spack-env/view/main/wrf.exe

Troubleshooting

When spack builds it uses a stage directory located in /tmp. Spack also cleans up this space once it is done building, regardless of if the build succeeds or fails. This can make troubleshooting failed builds difficult as the logs from those builds are stored in stage. To preserve these files for debugging you will first want to set the $TMP environmental variable to a location that you want to dump files in stage to. Then you will want to add the --keep-stage flag to spack (ex. spack install --keep-stage <package>), which tells spack to keep the staging files rather than remove them.

Cannot open shared object file: No such file or directory

This error occurs when the compiler cannot find a library it is dependent on. For example:

/n/sw/helmod/apps/centos7/Core/gcc/10.2.0-fasrc01/bin/../libexec/gcc/x86_64-pc-linux-gnu/10.2.0/cc1: error while loading shared libraries: libmpfr.so.6: cannot open shared object file: No such file or directory

In this error the compiler cannot find a library it is dependent on mpfr. To fix this we will need to add the relevant library to the compiler definition in ~/.spack/packages.yaml. In this case we are using gcc/10.2.0-fasrc01 which when loaded also loads:

[jharvard@holy7c22501 ~]# module list

Currently Loaded Modules:
  1) gmp/6.2.1-fasrc01   2) mpfr/4.1.0-fasrc01   3) mpc/1.2.1-fasrc01   4) gcc/10.2.0-fasrc01

So we will need to grab the location of these libraries to add them. To find that you can do:

[jharvard@holy7c22501 ~]# module display mpfr/4.1.0-fasrc01

And then pull out the LIBRARY_PATH. Once we have the paths for all three of these dependencies we can add them to the ~/.spack/packages.yaml as follows

- compiler:
    spec: gcc@10.2.0
    paths:
      cc: /n/helmod/apps/centos7/Core/gcc/10.2.0-fasrc01/bin/gcc
      cxx: /n/helmod/apps/centos7/Core/gcc/10.2.0-fasrc01/bin/g++
      f77: /n/helmod/apps/centos7/Core/gcc/10.2.0-fasrc01/bin/gfortran
      fc: /n/helmod/apps/centos7/Core/gcc/10.2.0-fasrc01/bin/gfortran
    flags: {}
    operating_system: centos7
    target: x86_64
    modules: []
    environment:
      prepend_path:
        LIBRARY_PATH: /n/helmod/apps/centos7/Core/mpc/1.2.1-fasrc01/lib64:/n/helmod/apps/centos7/Core/mpfr/4.1.0-fasrc01/lib64:/n/helmod/apps/centos7/Core/gmp/6.2.1-fasrc01/lib64
        LD_LIBRARY_PATH: /n/helmod/apps/centos7/Core/mpc/1.2.1-fasrc01/lib64:/n/helmod/apps/centos7/Core/mpfr/4.1.0-fasrc01/lib64:/n/helmod/apps/centos7/Core/gmp/6.2.1-fasrc01/lib64
    extra_rpaths: []

Namely we needed to add the prepend_path to the environment. With those additional paths defined the compiler will now work because it can find its dependencies.

C compiler cannot create executables

This is the same type of error as the Cannot open shared object file: No such file or directory. Namely the compiler cannot find the libraries it is dependent on. See the troubleshooting section for the shared objects error for how to resolve.

Error: Only supported on macOS

If you are trying to install a package and get an error about only macOS

$ spack install r@3.4.2
==> Error: Only supported on macOS

You need to update your compilers. For example, here you can see only Ubuntu compilers are available, which do not work on Rocky 8

$ spack compiler list
==> Available compilers
-- clang ubuntu18.04-x86_64 -------------------------------------
clang@7.0.0

-- gcc ubuntu18.04-x86_64 ---------------------------------------
gcc@7.5.0  gcc@6.5.0

Then, run compiler find to update compilers

$ spack compiler find
==> Added 1 new compiler to /n/home01/jharvard/.spack/packages.yaml
    gcc@8.5.0
==> Compilers are defined in the following files:
    /n/home01/jharvard/.spack/packages.yaml

Now, you can see a Rocky 8 compiler is also available

$ spack compiler list
==> Available compilers
-- clang ubuntu18.04-x86_64 -------------------------------------
clang@7.0.0

-- gcc rocky8-x86_64 --------------------------------------------
gcc@8.5.0

-- gcc ubuntu18.04-x86_64 ---------------------------------------
gcc@7.5.0  gcc@6.5.0

And you can proceed with the spack package installs.

Assembly Error

If your package has a gmake as a dependency, you may run into this error:

/tmp/ccRlxmkM.s:202: Error: no such instruction: `vmovw %ebp,%xmm3'

First, check that as version is ≤2.38 with:

$ as --version
GNU assembler version 2.30-123.el8

If that’s the case, then use a generic linux architecture as explained in Default Architecture.

References

]]>
29198
Extracting compressed .zip or .7z archives with 7-Zip https://docs.rc.fas.harvard.edu/kb/7zip/ Fri, 16 May 2025 17:28:06 +0000 https://docs.rc.fas.harvard.edu/?post_type=epkb_post_type_1&p=28697 p7zip/7-Zip is installed on the clusters, making it easy to create compressed/zipped archives or unzip/extract compressed archives.

Use 7-Zip using the command line. If you are using the Open OnDemand interface, you can start a Terminal Emulator window in the Remote Desktop application to extract the contents of a .zip or .7z archive file.

Some examples are included below:

Extracting archives

To list the contents of the file readme_docs.7z:
7z l readme_docs.7z

To extract an archive called readme_docs.7z to a new folder in your current directory called “extracted”, you would type
7z x readme_docs.7z -o./extracted

  • x is for “eXtract” (this command is useful if you are uncompressing a data source with its own directories – it preserves the directory structure. If you just want to extract everything in the archive without preserving directory structure within it, you can use the e command instead of x)
  • -o sets the Output directory
  • . means your current directory
  • note there is no space between the -o and the path for your extracted content.

To simply extract to the current folder :
7z x readme_docs.7z

Creating and adding to archives

To create an archive file, use the a command to Add files to an archive.
Specify the name of the archive, then the files that should be added to the archive.

This command adds the files doc1.txt and doc2.docx to an archive readme_docs.7z using the 7z compression format (default)
7z a readme_docs.7z doc1.txt doc2.docx

This command adds the contents of the directory “docs” to an archive readme_docs.7z using the 7z compression format (default)
7z a readme_docs.7z docs/

This command adds all files ending in “.txt” in the current directory to an archive readme_docs.7z using the 7z compression format (default)
7z a readme_docs.7z *.txt

This command adds the files doc1.txt and doc2.docx to an archive readme_docs.zip using the ZIP format.
Other options include -t7z (default), -tgzip, -tzip, -tbzip2, -tudf, -ttar.
7z a -tzip readme_docs.zip doc1.txt doc2.docx

While 7-Zip supports archiving using TAR, the tar command is also available. Please see our tips for using tar to archive data.

More information

In Terminal, you can type 7z for a list of commands and switches, or man 7z for detailed descriptions and examples.

]]>
28697
Podman https://docs.rc.fas.harvard.edu/kb/podman/ Thu, 23 Jan 2025 15:49:49 +0000 https://docs.rc.fas.harvard.edu/?post_type=epkb_post_type_1&p=28194 Introduction

Podman is an Open Containers Initiative (OCI) container toolchain developed by RedHat. Unlike its popular OCI cousin Docker, it is daemonless making it easier to use with resource schedulers like Slurm. Podman maintains command line interface (CLI) that is very similar to Docker. On the FASRC cluster the docker command runs podman under the hood and many docker commands just work with podman though with some exceptions. Note that this document uses the term container to mean OCI container. Besides Podman containers FASRC also supports Singularity.

Normally podman requires privileged access. However on the FASRC clusters we have enabled rootless podman, alleviating the requirement. We recommend reading our document on rootless containers before proceeding further so you understand how it works and its limitations.

Podman Documentation

The official Podman Documentation provides the latest information on how to use Podman. On this page we will merely highlight specific useful commands and features/quirks specific to the FASRC cluster. You can get command line help pages by running man podman or  podman --help.

Working with Podman

To start working with podman, first get an interactive session either via salloc or via Open OnDemand. Once you have that session then you can start working with your container image. The basic commands we will cover here are:

  • pull: Download a container image from a container registry
  • images: List downloaded images
  • run: Run a command in a new container
  • build: Create a container image from a Dockerfile/Containerfile
  • push: push a container image to a container registry

For these examples we will use the lolcow and ubuntu images from DockerHub.

WARNING: We do not recommend overriding the default configurations using: $HOME/.config/containers/storage.conf If you do so, it is at your own risk.

pull

podman pull fetches the specified container image and extracts it into node-local storage (/tmp/container-user-<uid> by default on the FASRC cluster). This step is optional, as podman will automatically download an image specified in a podman run, podman build, or podman shell command.

[jharvard@holy8a26601 ~]$ podman pull docker://godlovedc/lolcow
Trying to pull docker.io/godlovedc/lolcow:latest...
Getting image source signatures
Copying blob 8e860504ff1e done | 
Copying blob 9fb6c798fa41 done | 
Copying blob 3b61febd4aef done | 
Copying blob 9d99b9777eb0 done | 
Copying blob d010c8cf75d7 done | 
Copying blob 7fac07fb303e done | 
Copying config 577c1fe8e6 done | 
Writing manifest to image destination
577c1fe8e6d84360932b51767b65567550141af0801ff6d24ad10963e40472c5

images

podman images lists the images that are already available on the node (in /tmp/container-user-<uid>)

[jharvard@holy8a26601 ~]$ podman images
REPOSITORY                  TAG         IMAGE ID      CREATED      SIZE
docker.io/godlovedc/lolcow  latest      577c1fe8e6d8  7 years ago  248 MB

run

Podman containers may contain an entrypoint script that will execute when the container is run. To run the container:

[jharvard@holy8a26601 ~]$ podman run -it docker://godlovedc/lolcow
_______________________________________
/ Your society will be sought by people \
\ of taste and refinement. /
---------------------------------------
\ ^__^
\ (oo)\_______
(__)\ )\/\
||----w |
|| ||

To view the entrypoint script for a podman container:

[jharvard@holy8a26601 ~]$ podman inspect -f 'Entrypoint: {{.Config.Entrypoint}}\nCommand: {{.Config.Cmd}}' lolcow
Entrypoint: [/bin/sh -c fortune | cowsay | lolcat]
Command: []

shell

To start a shell inside a new container, specify the podman run -it --entrypoint bash options. -it effectively provides an interactive session, while --entrypoint bash invokes the bash shell (bash can be substituted with another shell program that exists in the container image).

[jharvard@holy8a26601 ~]$ podman run -it --entrypoint bash docker://godlovedc/lolcow
root@holy8a26601:/#

GPU Example

First, start an interactive job on a gpu partition. Then invoke podman run with the  --device nvidia.com/gpu=all option:

[jharvard@holygpu7c26306 ~]$ podman run --rm --device nvidia.com/gpu=all nvidia/cuda:11.0.3-base-ubuntu20.04 nvidia-smi
Wed Jan 22 15:41:58 2025 
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA A100-SXM4-40GB On | 00000000:CA:00.0 Off | On |
| N/A 27C P0 66W / 400W | N/A | N/A Default |
| | | Enabled |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| MIG devices: |
+------------------+----------------------------------+-----------+-----------------------+
| GPU GI CI MIG | Memory-Usage | Vol| Shared |
| ID ID Dev | BAR1-Usage | SM Unc| CE ENC DEC OFA JPG |
| | | ECC| |
|==================+==================================+===========+=======================|
| 0 2 0 0 | 37MiB / 19968MiB | 42 0 | 3 0 2 0 0 |
| | 0MiB / 32767MiB | | |
+------------------+----------------------------------+-----------+-----------------------+

+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
WARN[0001] Failed to add pause process to systemd sandbox cgroup: dbus: couldn't determine address of session bus

Batch Jobs

Podman containers can also be executed as part of a normal batch job as you would any other command. Simply include the command as part of the sbatch script. As an example here is a sample podman.sbatch:

#!/bin/bash
#SBATCH -J podman_test
#SBATCH -o podman_test.out
#SBATCH -e podman_test.err
#SBATCH -p test
#SBATCH -t 0-00:10
#SBATCH -c 1
#SBATCH --mem=4G

# Podman command line options
podman run docker://godlovedc/lolcow

When submitted to the cluster as a batch job:

[jharvard@holylogin08 ~]$ sbatch podman.sbatch

Generates the podman_test.out which contains:

[jharvard@holylogin08 ~]$ cat podman_test.out
____________________________________
< Don't read everything you believe. >
------------------------------------
\ ^__^
\ (oo)\_______
(__)\ )\/\
||----w |
|| ||

Accessing Files

Each podman container operates within its own isolated filesystem tree in /tmp/container-user-<uid>/storage. However, if needed, host file systems can be explicitly shared with containers by using the --volume option when starting a container (this is unlike Singularity which is set up to automatically bind several default filesystems). This option allows you to bind-mount a directory or file from the host into the container, granting the container access to that path. To access files on the host from inside the container, bind host file(s)/directory(ies) into the container using the --volume option. For instance, to access netscratch from the container:

[jharvard@holy8a26602 ~]$ podman run -it --entrypoint bash --volume /n/netscratch:/n/netscratch docker://ubuntu
root@holy8a26602:/# df -h
Filesystem Size Used Avail Use% Mounted on
overlay 397G 6.5G 391G 2% /
tmpfs 64M 0 64M 0% /dev
netscratch-ib01.rc.fas.harvard.edu:/netscratch/C 3.6P 1.8P 1.9P 49% /n/netscratch
/dev/mapper/vg_root-lv_scratch 397G 6.5G 391G 2% /run/secrets
shm 63M 0 63M 0% /dev/shm
devtmpfs 504G 0 504G 0% /dev/tty

Ownership of files as seen from the host that are created by a process in the container depend on the user ID (UID) of the creating process in the container, either:

  • The host (cluster) user, if the container user is:
    • root (UID 0) – this is often the default
    • podman run --userns=keep-id is specified, so the host user and primary group ID are used for the container user (similar to SingularityCE in the default native mode)
    • podman run --userns=keep-id:uid=<container-uid>,gid=<container-gid> is specified, using the specified UID/GID for the container user and mapping it to the host/cluster user’s UID/GID.E.g., in the following example, the “node” user in the container (UID=1000, GID=1000) creates a file that is (as seen from the host) owned by the host user:
      $ podman run -it --rm --user=node --entrypoint=id docker.io/library/node:22
      uid=1000(node) gid=1000(node) groups=1000(node)
      $ podman run -it --rm --volume /n/netscratch:/n/netscratch --userns=keep-id:uid=1000,gid=1000 --entrypoint=bash docker.io/library/node:22
      node@host:/$ touch /n/netscratch/jharvard_lab/Lab/jharvard/myfile
      node@host:/$ ls -l /n/netscratch/jharvard_lab/Lab/jharvard/myfile
      -rw-r--r--. 1 node node 0 Apr 7 16:05 /n/netscratch/jharvard_lab/Lab/jharvard/myfile
      node@host:/$ exit
      $ ls -ld /n/netscratch/jharvard_lab/Lab/jharvard/myfile
      -rw-r--r--. 1 jharvard jharvard_lab 0 Apr 7 12:05 /n/netscratch/jharvard_lab/Lab/jharvard/myfile
  • Otherwise the subuid/subgid is associated with the container-uid/container-gid (see rootless containers). Only filesystems that can resolve your subuid’s can be written to from a podman container (e.g. NFS file systems like /n/netscratch and home directories, or node-local filesystems like /scratch or /tmp; but not Lustre filesystems like holylabs) and only locations with “other” read/write/execute permissions can be utilized (e.g. the Everyone directory).

Environment Variables

A Podman container does not inherit environment variables from the host environment. Any environment variables that are not defined by the container image must be explicitly set with the –env option:

[jharvard@holy8a26602 ~]$ podman run -it --rm --env MY_VAR=test python:3.13-alpine python3 -c 'import os; print(os.environ["MY_VAR"])'
test

Building Your Own Podman Container

You can build or import a Podman container in several different ways. Common methods include:

  1. Download an existing OCI container image located in Docker Hub or another OCI container registry (e.g., quay.io, NVIDIA NGC Catalog, GitHub Container Registry).
  2. Build a Podman image from a Containerfile/Dockerfile.

Images are stored by default at /tmp/containers-user-<uid>/storage. You can find out more about the specific paths by running the podman info command. Since the default path is in /tmp that means that containers will only exist for the duration of the job and then the system will clean up the space.

Downloading OCI Container Image From Registry

To download a OCI container image from a registry simply use the pull command:

[jharvard@holy8a26602 ~]$ podman pull docker://godlovedc/lolcow
Trying to pull docker.io/godlovedc/lolcow:latest...
Getting image source signatures
Copying blob 8e860504ff1e done | 
Copying blob 9fb6c798fa41 done | 
Copying blob 3b61febd4aef done | 
Copying blob 9d99b9777eb0 done | 
Copying blob d010c8cf75d7 done | 
Copying blob 7fac07fb303e done | 
Copying config 577c1fe8e6 done | 
Writing manifest to image destination
577c1fe8e6d84360932b51767b65567550141af0801ff6d24ad10963e40472c5
WARN[0006] Failed to add pause process to systemd sandbox cgroup: dbus: couldn't determine address of session bus 
[jharvard@holy8a26602 ~]$ podman image ls
REPOSITORY TAG IMAGE ID CREATED SIZE
docker.io/godlovedc/lolcow latest 577c1fe8e6d8 7 years ago 248 MB

Build Podman Image From Dockerfile/Containerfile

Podman can build container images from a Dockerfile/Containerfile (podman prefers the generic term Containerfile, and podman build without -f <file> will first check for the existence of Containerfile in the current working directory, falling back to Dockerfile if one doesn’t exist). To build first write your Containerfile:

FROM ubuntu:22.04

RUN apt-get -y update \
  && apt-get -y install cowsay lolcat\
  && rm -rf /var/lib/apt/lists/*
ENV LC_ALL=C PATH=/usr/games:$PATH 

ENTRYPOINT ["/bin/sh", "-c", "date | cowsay | lolcat"]

Then run the build command (assuming Dockerfile or Containerfile in the current working directory):

[jharvard@holy8a26602 ~]$ podman build -t localhost/lolcow
STEP 1/4: FROM ubuntu:22.04
Resolved "ubuntu" as an alias (/etc/containers/registries.conf.d/000-shortnames.conf)
Trying to pull docker.io/library/ubuntu:22.04...
Getting image source signatures
Copying blob 6414378b6477 done | 
Copying config 97271d29cb done | 
Writing manifest to image destination
STEP 2/4: RUN apt-get -y update && apt-get -y install cowsay lolcat

... omitted output ...

Running hooks in /etc/ca-certificates/update.d...
done.
--> a41765f5337a
STEP 3/4: ENV LC_ALL=C PATH=/usr/games:$PATH
--> e9eead916e20
STEP 4/4: ENTRYPOINT ["/bin/sh", "-c", "date | cowsay | lolcat"]
COMMIT
--> 51e919dd571f
51e919dd571f1c8a760ef54c746dcb190659bdd353cbdaa1d261ba8d50694d24

Saving/loading a container image on another compute node

Podman container images are stored on a node-local filesystem (/tmp/container-user-<uid>). Any container images built / pulled on one node that are needed on another node must be saved to a data storage location that is accessible to all compute nodes in the FAS RC cluster. The podman save command can be used to accomplish this.

[jharvard@holy8a26602 ~]$ podman save --format oci-archive -o lolcow.tar localhost/lolcow
[jharvard@holy8a26602 ~]$ ls -lh lolcow.tar 
-rw-r--r--. 1 jharvard jharvard_lab 57M Jan 27 11:37 lolcow.tar

Note: omitting --format oci-archive saves the file in the docker-archive format, which is uncompressed, and thus faster to save/load though larger in size.

From another compute node, podman load extracts the docker- or oci-archive to the node-local /tmp/container-user-<uid>, where it can be used by podman:

[jharvard@holy8a26603 ~]$ podman images
REPOSITORY  TAG         IMAGE ID    CREATED     SIZE
[jharvard@holy8a26603 ~]$ podman load -i lolcow.tar
Getting image source signatures
Copying blob 163070f105c3 done   | 
Copying blob f88085971e43 done   | 
Copying config e9749e43bc done   | 
Writing manifest to image destination
Loaded image: localhost/lolcow:latest
[jharvard@holy8a26603 ~]$ podman images
REPOSITORY        TAG         IMAGE ID      CREATED        SIZE
localhost/lolcow  latest      e9749e43bc74  6 minutes ago  172 MB

Pushing a container image to a container registry

To make a container image built on the FASRC cluster available outside the FASRC cluster, the container image can be pushed to a container registry. Popular container registries with a free tier include Docker Hub and the GitHub Container Registry.

This example illustrates the use of the GitHub Container Registry, and assumes a GitHub account.

Note: The GitHub Container Registry is a part of the GitHub Packages ecosystem

  1. Create a Personal access token (classic) with write:packages scope (this implicitly adds read:packages for pulling private container images):
    https://github.com/settings/tokens/new?scopes=write:packages
  2. Authenticate to ghcr.io, using the authentication token generated in step 1 as the “password” (replace “<GITHUB_USERNAME>” with your GitHub username):
    [jharvard@holy8a26603 ~]$ podman login -u <GITHUB_USERNAME> ghcr.io
    Password: <paste authentication token>
    Login succeeded!
  3. Ensure the image has been named ghcr.io/<OWNER>/<image>:<tag> (where “<OWNER>” is either your GitHub username, or an organization that you are a member of and have permission to
    using the podman tag command to add a name to an existing local image if it needed (Note: the GitHub owner must be all lower-case (e.g., jharvard instead of JHarvard)):

    [jharvard@holy8a26603 ~]$ podman tag localhost/lolcow:latest ghcr.io/<OWNER>/lolcow:latest
  4. Push the image to the container registry:
[jharvard@holy8a26603 ~]$ podman push ghcr.io/<OWNER>/lolcow:latest
​​Getting image source signatures
Copying blob 2573e0d81582 done   |
… 
Writing manifest to image destination

By default, the container image will be private. To change the visibility to “public”, access the package from the list at https://github.com/GITHUB_OWNER?tab=packages and configure the package settings (see Configuring a package’s access control and visibility).

Podman Compose: Run multi-container apps

podman compose can be used to run multi-container applications (e.g. a web service + database) on a single compute node. The containers to be managed and their relationships are defined in a Compose file (typically `compose.yaml` or `docker-compose.yaml`).

Using Docker Compose with Podman

podman compose is a wrapper that uses a compose provider. Two compose providers supported are:

  1. podman-compose – The podman-compose compose provider is still under active development, and lacks some functionality found in the original Docker Compose. podman-compose is preinstalled on the FASRC cluster (/usr/bin/podman-compose).
  2. docker-compose For greater compatibility with some Compose files, podman compose can use Docker Compose as its compose provider. The setup involves installing the docker-compose CLI plugin in your home directory:mkdir -p ~/.docker/cli-plugins
    curl -o ~/.docker/cli-plugins/docker-compose -L https://github.com/docker/compose/releases/latest/download/docker-compose-linux-x86_64
    chmod +x ~/.docker/cli-plugins/docker-compose

    Before running podman compose with the docker-compose compose provider on a compute node, a podman-system-service must be started on that compute node to provide a Docker-daemon-compatible API for the docker-compose compose provider:[jharvard@holy8a24101 ~]$ podman system service -t 0 &

Example: HPC-hosted Private LLM Service

The following example deploys a private LLM service using Open WebUI front-end web interface connected to a llama.cpp HTTP server backend hosting the Meta Llama-3.2-1B-Instruct model. Only CPU resources are used for convenient deployment.

  1. Launch an Open OnDemand Remote Desktop with 4 CPUs and 8 GB memory.
    (Alternatively, SSH tunneling or VS Code with port forwarding can be used to access web applications running on FAS RC compute nodes)
  2. Save the following Compose configuration to a file called compose.yaml  on the FASRC cluster:
    services:
      llama-server:
        image: ghcr.io/ggml-org/llama.cpp:server
        tty: true
        command: >
          --hf-repo QuantFactory/Llama-3.2-1B-Instruct-GGUF
          --hf-file Llama-3.2-1B-Instruct.Q8_0.gguf
          --threads ${SLURM_CPUS_PER_TASK:-1}
          --ctx-size 8192
        environment:
          LLAMA_CACHE: /cache
        volumes:
          - cache:/cache
    
    open-webui:
      image: ghcr.io/open-webui/open-webui:main
      depends_on:
        - llama-server
      environment:
        OPENAI_API_BASE_URL: http://llama-server:8080/v1
        WEBUI_SECRET_KEY:
      volumes:
        - openwebui:/app/backend/data
    # set the OPEN_WEBUI_PORT environment variable to use a different port
      ports:
        - "127.0.0.1:${OPEN_WEBUI_PORT:-8080}:8080"
    
    volumes:
      cache: {}
      openwebui: {}
  3. Open a terminal in the Remote Desktop session, navigate to the directory containing the compose.yaml file, and issue the following command to start the containers:
    podman compose up
    Note: works with the default podman-compose compose provider; see above for additional setup to use docker-compose instead
  4. Once the logs indicate the open-webui service is ready (i.e., the text [open-webui] |  INFO:     Started server process [1] appears in the logs), open a web browser on the remote desktop (Applications > Internet > FireFox) and access http://localhost:8080/. Follow the prompt to create an admin account before accessing the Open WebUI chat interface to interact with the LLM.

Known Limitations

The following are known limitations of rootless Podman on the FASRC cluster.

–cpuset-cpus option is unsupported

Workarounds include:

  • Slurm srun -c, –cpus-per-task option within a Slurm job to launch podman with CPU binding to a specified number of allocated CPUs; e.g.
    srun -c 2 podman run ...
  • The taskset command to set the CPU affinity:
    user@holy8a24101 ~]$ taskset -cp $$
    pid 368917’s current affinity list: 102-105
    user@holy8a24101 test]$ taskset -c 102-103 podman run …

podman rm <container_id> doesn’t work

Symptom: a container is stuck in the “stopping” state and cannot be removed.

$ podman ps
$ podman ps -a
CONTAINER ID ... STATUS ...
2eaa0ca21480 ... Stopping ...
$ podman rm 2eaa0ca21480
...
Error: cannot remove container 2eaa0ca21480... as it is stopping - running or paused containers cannot be removed without force: container state improper

Solution: Use podman kill <container-id> before podman rm <container-id>

$ podman kill 2eaa0ca21480
2eaa0ca21480
$ podman rm 2eaa0ca21480
2eaa0ca21480
$ podman ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES

Online Trainings Materials

References

]]>
28194
Containers https://docs.rc.fas.harvard.edu/kb/containers/ Fri, 10 Jan 2025 20:28:52 +0000 https://docs.rc.fas.harvard.edu/?post_type=epkb_post_type_1&p=28111 Introduction

Containers have become the industry standard method for managing complex software environments, especially ones with bespoke configuration options. In brief, a container is a self-contained environment and software stack that runs on the host operating system (OS). They can allow users to use a variety of base operating systems (e.g., Ubuntu) and their software packages aside from the host OS the cluster nodes run (Rocky Linux). One can even impersonate root inside the container to allow for highly customized builds and a high level of control of the environment.

While containers allow for the management of sophisticated software stacks, they are not a panacea. As light as containers are, they still create a performance penalty for use, as the more layers you put between the code and the hardware, the more inefficiencies pile up. In addition, host filesystem access and various other permission issues can be tricky. Other incompatibilities can arise between the OS of the container and the OS of the compute node.

Still, with these provisos in mind, containers are an excellent tool for software management. Containers exist for many software packages, making software installation faster and more trouble-free. Containers also make it easy to record and share the exact stack of software required for a workflow, allowing other researchers to more-easily reproduce research results in order to validate and extend them.

Types of Containers

There are two main types of containers. The first is the industry standard OCI (Open Container Initiative) container, popularized by Docker. Docker uses a client-server architecture, with one (usually) privileged background server process (or “daemon” process, called “dockerd”) per host. If run on a multi-tenant system (e.g., HPC cluster such as Cannon), this results in a security issue in that users who interact with the privileged daemon process could access files owned by other users. Additionally, on an HPC cluster, the docker daemon process does not integrate with Slurm resource allocation facilities.

Podman, a daemonless OCI container toolchain developed by RedHat to address these issues, is installed on the FASRC cluster. The Podman CLI (command-line interface) was designed to be largely compatible with the Docker CLI, and on the FASRC cluster, the docker command runs podman under the hood. Many docker commands will just work with podman, though there are some differences.

The second is Singularity. Singularity grew out of the need for additional security in shared user contexts (like you find on a cluster). Since Docker normally requires the user to run as root, Singularity was created to alleviate this requirement and bring the advantages of containerization to a broader context. There are a couple of implementations of Singularity, and on the cluster, we use SingularityCE (the other implementation is Apptainer). Singularity has the ability to convert OCI (docker) images into Singularity Image Format (SIF) files. Singularity images have the advantage of being distributable as a single read-only file; on an HPC cluster, this can be located on a shared filesystem, which can be easily launched by processes on different nodes. Additionally, Singularity containers can run as the user who launched them without elevated privileges.

Rootless

Normally, building a container requires root permissions, and in the case of Podman/Docker, the containers themselves would ordinarily be launched by the root user. While this may be fine in a cloud context, it is not in a shared resource context like a HPC cluster. Rootless is the solution to this problem.

Rootless essentially allows the user to spoof being root inside the container. It does this via a Linux feature called subuid (short for Subordinate User ID) and subgid (Subordinate Group ID). This feature allows a range of uid’s (a unique integer assigned to each user name used for permissions identification) and gid’s (unique integer for groups) to be subordinated to another uid. An example is illustrative. Let’s say you are userA with a uid of 20000. You are assigned the subuid range of 1020001-1021000. When you run your container, the following mapping happens:

In the Container [username(uid)] Outside the Container [username(uid)]
root(0) userA(20000)
apache(48) 1020048
ubuntu(1000) 1021000

Thus, you can see that while you are inside the container, you pretend to be another user and have all the privileges of that user in the container. Outside the container, though, you are acting as your user, and the uid’s subordinated to your user.

A few notes are important here:

  1. The subuid/subgid range assigned to each user does not overlap the uid/gid or subuid/subgid range assigned to any other user or group.
  2. While you may be spoofing a specific user inside of the container, the process outside the container sees you as your normal uid or subuid. Thus, if you use normal Linux tools like top or ps outside the container, you will notice that the id’s that show up are your uid and subuid.
  3. Filesystems, since they are external, also see you as your normal uid/gid and subuid/subgid. So files created as root in the container will show up on the storage as owned by your uid/gid. Files created by other users in the container will show up as their mapped subuid/subgid.

Rootless is very powerful and allows you to both build containers on the cluster, as well as running Podman/Docker containers right out of the box. If you want to see what your subuid mapping is, you can find the mappings at /etc/subuid and /etc/subgid. You can find your uid by running the id command, which you can then use to look up your map (e.g., with the command: grep "^$(id -u):" /etc/subuid).

Rootless and Filesystems

Two more crucial notes about filesystems. The first is that since subuids are not part of our normal authentication system, it means that filesystems that cannot resolve subids will not permit them access. In particular Lustre (e.g., /n/holylabs) does not recognize subuids and since it cannot resolve them, it will not permit them. NFS filesystems (e.g., /n/netscratch) do not have this problem.

The second is that even if you can get into the filesystem, you may not be able to traverse into locations that do not have world access (o+rx) enabled. This is because the filesystem cannot resolve your user group or user name, does not see you as a valid member of the group, and thus will reject you. As such, it is imperative to test and validate filesystem access for filesystems you intend to map into the container and ensure that access is achievable. A simple way to ensure this is to utilize the Everyone directory which exists for most filesystems on the cluster. Note that your home directory is not world accessible for security reasons and thus cannot be used.

Getting Started

The first step in utilizing a container on the cluster is to submit a job. Login nodes are not appropriate places for development. If you are just beginning, the easiest method is to either get a command line interactive session via salloc, or launch an OOD session.

Once you have a session, you can then launch your container:

Singularity

[jharvard@holy8a26602 ~]$ singularity run docker://godlovedc/lolcow
INFO: Downloading library image to tmp cache: /scratch/sbuild-tmp-cache-701047440
INFO: Converting OCI blobs to SIF format
INFO: Starting build...
INFO: Fetching OCI image...
45.3MiB / 45.3MiB [============================================================================================================================] 100 % 21.5 MiB/s 0s
53.7MiB / 53.7MiB [============================================================================================================================] 100 % 21.5 MiB/s 0s
INFO: Extracting OCI image...
2025/01/09 10:49:52 warn rootless{dev/agpgart} creating empty file in place of device 10:175
2025/01/09 10:49:52 warn rootless{dev/audio} creating empty file in place of device 14:4
2025/01/09 10:49:52 warn rootless{dev/audio1} creating empty file in place of device 14:20
INFO: Inserting Singularity configuration...
INFO: Creating SIF file...
_________________________________________
/ Q: What do you call a principal female \
| opera singer whose high C |
| |
| is lower than those of other principal |
\ female opera singers? A: A deep C diva. /
-----------------------------------------
\ ^__^
\ (oo)\_______
(__)\ )\/\
||----w |
|| ||

Podman

[jharvard@holy8a24601 ~]$ podman run docker://godlovedc/lolcow
Trying to pull docker.io/godlovedc/lolcow:latest...
Getting image source signatures
Copying blob 8e860504ff1e done |
Copying blob 9fb6c798fa41 done |
Copying blob 3b61febd4aef done |
Copying blob 9d99b9777eb0 done |
Copying blob d010c8cf75d7 done |
Copying blob 7fac07fb303e done |
Copying config 577c1fe8e6 done |
Writing manifest to image destination
_____________________________
< Give him an evasive answer. >
-----------------------------
\ ^__^
\ (oo)\_______
(__)\ )\/\
||----w |
|| ||

Shell

If you want to get a shell prompt in a container do:

Singularity

[jharvard@holy8a26602 ~]$ singularity shell docker://godlovedc/lolcow
Singularity>

Podman

[jharvard@holy8a26601 ~]$ podman run --rm -it --entrypoint bash docker://godlovedc/lolcow
root@holy8a26601:/#

GPU

If you want to use a GPU in a container first start a job reserving a GPU on a gpu node. Then do the following:

Singularity

You will want to add the --nv flag for singularity:

[jharvard@holygpu7c26306 ~]$ singularity exec --nv docker://godlovedc/lolcow /bin/bash
Singularity> nvidia-smi
Fri Jan 10 15:50:20 2025 
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA A100-SXM4-40GB On | 00000000:4B:00.0 Off | On |
| N/A 24C P0 43W / 400W | 74MiB / 40960MiB | N/A Default |
| | | Enabled |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| MIG devices: |
+------------------+----------------------------------+-----------+-----------------------+
| GPU GI CI MIG | Memory-Usage | Vol| Shared |
| ID ID Dev | BAR1-Usage | SM Unc| CE ENC DEC OFA JPG |
| | | ECC| |
|==================+==================================+===========+=======================|
| 0 1 0 0 | 37MiB / 19968MiB | 42 0 | 3 0 2 0 0 |
| | 0MiB / 32767MiB | | |
+------------------+----------------------------------+-----------+-----------------------+

+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
Singularity>

Podman

For podman you need to add --device nvidia.com/gpu=all:

[jharvard@holygpu7c26305 ~]$ podman run --rm --device nvidia.com/gpu=all nvidia/cuda:11.0.3-base-ubuntu20.04 nvidia-smi
Fri Jan 10 20:26:57 2025 
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA A100-SXM4-40GB On | 00000000:31:00.0 Off | On |
| N/A 25C P0 47W / 400W | N/A | N/A Default |
| | | Enabled |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| MIG devices: |
+------------------+----------------------------------+-----------+-----------------------+
| GPU GI CI MIG | Memory-Usage | Vol| Shared |
| ID ID Dev | BAR1-Usage | SM Unc| CE ENC DEC OFA JPG |
| | | ECC| |
|==================+==================================+===========+=======================|
| 0 2 0 0 | 37MiB / 19968MiB | 42 0 | 3 0 2 0 0 |
| | 0MiB / 32767MiB | | |
+------------------+----------------------------------+-----------+-----------------------+

+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
WARN[0005] Failed to add pause process to systemd sandbox cgroup: dbus: couldn't determine address of session bus

Docker Rate Limiting

Docker Hub limits the number of pulls anonymous accounts can make. If you hit either an error of:

ERROR: toomanyrequests: Too Many Requests.

or

You have reached your pull rate limit. You may increase the limit by authenticating and upgrading: https://www.docker.com/increase-rate-limits.

you will need to create a Docker account to increase your limit. See the Docker documentation for more details.

Once you have a Docker account, you can authenticate with Docker Hub with your Docker Hub account (not FASRC account) and then run a Docker container.

Singularity

singularity remote login --username <dockerhub_username> docker://docker.io

Podman

podman login docker.io

Advanced Usage

For advanced usage tips such as how to build your own containers, see our specific container software pages:

]]>
28111
Python Package Installation https://docs.rc.fas.harvard.edu/kb/python-package-installation/ Wed, 04 Sep 2024 17:02:13 +0000 https://docs.rc.fas.harvard.edu/?post_type=epkb_post_type_1&p=27591 Description

Python packages on the cluster are primarily managed with Mamba.  Direct use of pip, outside of a virtual environment, is discouraged on the FASRC clusters.

Mamba is a package manager that is a drop-in replacement for Conda, and is generally faster and better at resolving dependencies:

  • Speed: Mamba is written in C++, which makes it faster than Conda. Mamba uses parallel processing and efficient code to install packages faster.
  • Compatibility: Mamba is fully compatible with Conda, so it can use the same commands, packages, and environment configurations.
  • Cross-platform support: Mamba works on Mac, Linux and Windows.
  • Dependency resolution: Mamba is better at resolving dependencies than Conda.
  • Environment creation: Mamba is faster at creating environments, especially large ones.
  • Package repository: Mamba uses Mambaforge ( aka conda-forge ), the most up to date packages available.

Important:
Anaconda is currently reviewing its Terms of Service for Academia and Research and is expected to conclude the update by the end of 2024. There is a possibility that  Conda may no longer be free for non-profit academic research use at institutions with more than 200 employees. And downloading packages through Anaconda’s Main channel may incur costs.  Hence, we recommend our users switch to using open-source conda-forge channel for package distribution when possible. Our python module is built with Miniforge3 distribution that has conda-forge set as its default channel. 

Mamba is a drop-in replacement for Conda and uses the same commands and configuration options as conda. You can swap almost all commands between conda & mamba.  By default, mamba uses conda-forge, the free Mambaforge package repository.  ( In this doc, we will generally only refer to mamba.)

Usage

mamba is available on the FASRC cluster as a software module either as Mambaforge/Miniforge or as python/3* which is aliased to mamba. One can access this by loading either of the following modules:

$ module load python

To see Python’s version

$ python --version

Environments

You can create a virtual environment with mamba in the same way as with conda. However, it is important to start an interactive session prior to creating an environment and installing desired packages in the following manner:

$ salloc --partition test --nodes=1 --cpus-per-task=2 --mem=4GB --time=0-02:00:00
$ module load python

By default, Python packages are installed in your home directory, in ~/.conda/envs. If you would like to locate the packages elsewhere, like a Lab shared directory, then specify the absolute file path.

export CONDA_PKGS_DIRS=/<FILEPATH>/conda/pkgs
export CONDA_ENVS_PATH=/<FILEPATH>/conda/envs

Create an environment using mamba

$ mamba create -n <ENV_NAME>

Alternatively, you can create an environment and install packages at the same time. This ensures package dependencies are met and could also speed up your setup time significantly. The general syntax to create an environment and install packages is:

$ mamba create -n <ENV_NAME> <PACKAGES>

For example,

$ mamba create -n python_env1 python={PYTHON_VERS} pip wheel

You must activate an environment to use it or install packages within it. To activate and use an environment:

$ source activate python_env1

To deactivate an active environment:

$ source deactivate

To list packages inside the environment:

$ mamba list

To install new packages in the environment (optional: -y is to proceed with installation):

$ mamba install -y <PACKAGE>

For example, to install numpy:

$ mamba install -y numpy

To install a package from a specific channel, add --channel (or -c) argument:

$ mamba install --channel <CHANNEL-NAME> -y <PACKAGE>

For example, to install the package boltons from the conda-forge channel:

$ mamba install --channel conda-forge boltons

To uninstall packages:

$ mamba uninstall <PACKAGE>

To delete an environment:

$ conda remove -n <ENV_NAME> --all -y

For additional features, please refer to the Mamba documentation.

Pip Installs

Avoid using pip outside of a mamba environment on any FASRC cluster. If you run pip install outside of a mamba environment, the installed packages will be placed in your $HOME/.local directory, which can lead to package conflicts and may cause some packages to fail to install or load correctly via mamba.

For example, if your environment name is python_env1:

$ module load python
$ source activate python_env1
$ pip install <package_name>

Best Practices

Use mamba environment in Jupyter Notebooks

If you would like to use a mamba environment as a kernel in a Jupyter Notebook on Open OnDemand (Cannon OOD or FASSE OOD), you have to install packages, ipykernel and nb_conda_kernels. These packages will allow Jupyter to detect mamba environments that you created from the command line.

For example, if your environment name is python_env1:

$ module load python
$ source activate python_env1
$ mamba install ipykernel nb_conda_kernels
After these packages are installed, launch a new Jupyter Notebook job (existing Jupyter Notebook jobs will fail to “see” this environment). Then:
  1. Open a Jupyter Notebook (a .ipynb file)
  2. On the top menu, click Kernel -> Change kernel -> select the conda environment

Mamba environments in a desired location

With mamba, use the -p or --prefix option to specify writing environment files to a desired location, such as the holylabs location.  Don’t use your home directory as it has very low performance due to filesystem latency.  Using a lab share location, you can also share your conda environment with other people on the cluster.  Keep in mind, you will need to make the destination directory, and specify the python version to use.  For example:

$ mamba create --prefix /n/holylabs/{YOUR_LAB}/Lab/envs python={PYTHON_VERS}

$ source activate /n/holylabs/{YOUR_LAB}/Lab/envs

To delete an environment at that desired location: $ conda remove -p /n/holylabs/{YOUR_LAB}/Lab/envs --all -y

Troubleshooting

Interactive vs. batch jobs

If your code works in an interactive job, but fails in a slurm batch job,

  1. You are submitting your jobs from within a mamba environment.
    Solution 1: Deactivate your environment with the command mamba deactivate and submit the job or
    Solution 2: Open another terminal and submit the job from outside the environment.

  2. Check if your ~/.bashrc or ~/.bash_profile files have a section of conda initialize or a source activate command. The conda initialize section is known to create issues on the FASRC clusters.
    Solution: Delete the section between the two conda initialize statements. If you have source activate in those files, delete it or comment it out.
    For more information on ~/.bashrc files, see https://docs.rc.fas.harvard.edu/kb/editing-your-bashrc/

Jupyter Notebook or JupyterLab on Open OnDemand/VDI problems

See Jupyter Notebook or JupyterLab on Open OnDemand/VDI troubleshooting section.

Unable to install packages

If you are not being able to install packages or the package installation is taking a significantly long time, check your ~/.condarc file. As stated in Conda docs, this is an optional runtime configuration file. One can use this file to configure conda/mamba to search from specific channels for package installation.

We recommend users not have this file or keep it empty. This allows users to install packages in their conda/mamba environments using the defaults provided by the open-source distribution, Miniforge , that we have made available to our users via our newer Python modules.

If, for any reason, ~/.condarc exists in your cluster profile then check its contents. If the default channel is showing up as conda , edit it to conda-forge so that your ~/.condarc uses this open-source channel for package installation.

Similarly, if  you had created an environment a long time ago using the Anaconda distribution and it is no longer working, then it is best to create a new environment using the open-source distribution as described above while ensuring that ~/.condarc, if exists, is pointing to conda-forge as its default channel.

For example, if you created a conda environment using one of our older Python modules, say Anaconda2/2019.10-fasrc01, you can see that conda is configured to use repo.anaconda.com for package installation.

$ module load Anaconda2 
$ conda info 
... 
channel URLs : 
https://repo.anaconda.com/pkgs/main/linux-64 
https://repo.anaconda.com/pkgs/main/noarch 
https://repo.anaconda.com/pkgs/r/linux-64 
https://repo.anaconda.com/pkgs/r/noarch 
...

In order to change this configuration, you can execute the conda config command to ensure that conda now points to conda-forge. This would also create a .condarc file in your $HOME, if it already doesn’t exist:

$ conda config --add default_channels https://conda.anaconda.org/conda-forge/ 

$ cat ~/.condarc 
default_channels: - https://conda.anaconda.org/conda-forge/ 

$ conda info 
... 
channel URLs : 
https://conda.anaconda.org/conda-forge/linux-64 
https://conda.anaconda.org/conda-forge/noarch
...
]]>
27591
R and RStudio https://docs.rc.fas.harvard.edu/kb/r-and-rstudio/ Fri, 07 Jun 2024 20:46:42 +0000 https://docs.rc.fas.harvard.edu/?post_type=epkb_post_type_1&p=27082

Description

R is a language and environment for statistical computing and graphics.   There are several options to use R on the FASRC clusters.

Of those options, the FASRC recommended method is the RStudio Server stand-alone app on Open OnDemand.

We recommend using RStudio Server on Open OnDemand because it is the simplest way to install R packages (see RStudio Server). We only recommend R module and RStudio Desktop if you:

  • plan to run mpi/multi-node jobs
  • need to choose specific compilers for R package installation
  • are an experienced user, who
    • knows how to compile software from source
    • has too much time on their hands
    • likes to take risks that often don’t payoff ( Hey, there is always RStudio on Open OnDemand, right? )

Should you require it, we offer ( “at your own risk” ):

Usage

RStudio on Open OnDemand

Use RStudio on Open OnDemand to reduce your stress and that of those around you.  Here is a short 5 minute video to get you started:

RStudio Server

RStudio Server is our go-to RStudio app because it contains a wide range of precompiled R packages from bioconductor and rocker/tidyverse. This means that installing R packages in RStudio Server is pretty straightforward. Most times, it will be sufficient to simply:

> install.packages("package_name")

This simplicity was possible because RStudio Server runs inside a Singularity container, meaning that it does not use the host operating system (OS). For more information on Singularity, refer to our Singularity on the cluster docs.

Important notes:

  • User-installed R libraries will be installed in ~/R/ifxrstudio/\<IMAGE_TAG\>
  • This app contains many pre-compiled packages from bioconductor and rocker/tidyverse.
  • FAS RC environment modules (e.g. module load) and Slurm (e.g. sbatch) are not accessible from this app.
  • For the RStudio with environment module and Slurm support, see RStudio Desktop

This app is useful for most applications, including multi-core jobs. However, it is not suitable for multi-node jobs. For multi-node jobs, the recommended app is RStudio Desktop.

FASSE cluster additional settings

If you are using FASSE Open OnDemand and need to install R packages in RStudio Server, you will likely need to set the proxies as explained in our Proxy Settings documentation. Before installing packages, execute these two commands in RStudio Server:

> Sys.setenv(http_proxy="http://rcproxy.rc.fas.harvard.edu:3128")
> Sys.setenv(https_proxy="http://rcproxy.rc.fas.harvard.edu:3128")

Package Seurat

In RStudio Server Release 3.18, the default version for umap-learn is 0.5.5. However, this version contains a bug. To resolve this issue, downgrade to umap-learn version 0.5.4:

> install.packages("Seurat")
> reticulate::py_install(packages = c("umap-learn==0.5.4","numpy<2"))

And test with

> library(Seurat)
> data("pbmc_small")
> pbmc_small <- RunUMAP(object = pbmc_small, dims = 1:5, metric='correlation', umap.method='umap-learn')
UMAP(angular_rp_forest=True, local_connectivity=1, metric='correlation', min_dist=0.3, n_neighbors=30, random_state=RandomState(MT19937) at 0x14F205B9E240, verbose=True)
Wed Jul 3 17:22:55 2024 Construct fuzzy simplicial set
Wed Jul 3 17:22:56 2024 Finding Nearest Neighbors
Wed Jul 3 17:22:58 2024 Finished Nearest Neighbor Search
Wed Jul 3 17:23:00 2024 Construct embedding
Epochs completed: 100%| ██████████ 500/500 [00:00]
Wed Jul 3 17:23:01 2024 Finished embedding

R, CRAN, and RStudio Server pinned versions

FASRC RStudio server pins R, CRAN, and RStudio Server versions to a specific date to ensure R package compatibility. Therefore, we strongly recommend using > install.packages("package_name") with no repos argument specified.

For example, in Release 3.20:

> install.packages("parallelly")
Installing package into ‘/n/home12/jharvard/R/ifxrstudio/RELEASE_3_20’
(as ‘lib’ is unspecified)
trying URL 'https://p3m.dev/cran/__linux__/noble/2025-02-27/src/contrib/parallelly_1.42.0.tar.gz'
Content type 'binary/octet-stream' length 537560 bytes (524 KB)
==================================================
downloaded 524 KB

* installing *binary* package ‘parallelly’ ...
* DONE (parallelly)

The downloaded source packages are in
‘/tmp/Rtmp1AiMaa/downloaded_packages’

Above, the package is downloaded from https://p3m.dev/cran/__linux__/noble/2025-02-27/src/contrib/parallelly_1.42.0.tar.gz Note the date 2025-02-27 and not latest.

For more details see Rocker project which is the base image for FASRC’s RStudio Server.

Advanced Installation: the R package latest version ( not recommended )

This following approach is not recommended, but if you need to build the latest version of a package from source for some reason, you may specify the repo, version, or install from github. Do note that using this approach is very tricky and will likely break R package dependencies. Please do not do this.  Kittens will explode.

repos example:

install.packages("rstan", repos = "https://cloud.r-project.org")

install_github example:

> require(remotes)
> install_github("wadpac/GGIR", ref = "3.2-10")

Use R packages from RStudio Server in a batch job

The RStudio Server OOD app hosted on Cannon at rcood.rc.fas.harvard.edu and FASSE at fasseood.rc.fas.harvard.edu runs RStudio Server in a Singularity container (see Singularity on the cluster). The path to the Singularity image on both Cannon and FASSE clusters is the same:

/n/singularity_images/informatics/ifxrstudio/ifxrstudio:RELEASE_<VERSION>.sif

Where <VERSION> corresponds to the Bioconductor version listed in the “R version” dropdown menu. For example:

R 4.2.3 (Bioconductor 3.16, RStudio 2023.03.0)

uses the Singularity image:

/n/singularity_images/informatics/ifxrstudio/ifxrstudio:RELEASE_3_16.sif

As mentioned above, when using the RStudio Server OOD app, user-installed R packages by default go in:

~/R/ifxrstudio/RELEASE_<VERSION>

This is an example of a batch script named runscript.sh that executes R script myscript.R inside the Singularity container RELEASE_3_16:

#!/bin/bash
#SBATCH -c 1 # Number of cores (-c)
#SBATCH -t 0-01:00 # Runtime in D-HH:MM
#SBATCH -p test # Partition to submit to
#SBATCH --mem=1G # Memory pool for all cores (see also --mem-per-cpu)
#SBATCH -o myoutput_%j.out # File to which STDOUT will be written, %j inserts jobid
#SBATCH -e myerrors_%j.err # File to which STDERR will be written, %j inserts jobid

# set R packages and rstudio server singularity image locations
my_packages=${HOME}/R/ifxrstudio/RELEASE_3_16
rstudio_singularity_image="/n/singularity_images/informatics/ifxrstudio/ifxrstudio:RELEASE_3_16.sif"

# run myscript.R using RStudio Server signularity image
singularity exec --cleanenv --env R_LIBS_USER=${my_packages} ${rstudio_singularity_image} Rscript myscript.R

To submit the job, execute the command:

sbatch runscript.sh

R and RStudio on Windows

See our R and RStudio on Windows page.

Advanced Usage: Not Better, Not Faster and Not Recommended *

( * Fine, it could be faster if you really know what you are doing. Still not recommended.)

These options are for users familiar with software installation from the source, where you choose compilers and set your environmental variables. If you are not familiar with these concepts, we highly recommend using RStudio Server instead.

WARNING: If you got really good at using RStudio with Open OnDemand, and think you are expert now– think you are advanced now–  You are not.  Go back to Open OnDemand where you were well supported and experienced near effortless success.

R module

To use R module, you should first have taken our Introduction to the FASRC training and be familiar with running jobs on the cluster. R modules come with some basic R packages. If you use a module, you will likely have to install most of the R packages that you need.

To use R on the FASRC clusters, load R via our module system. For example, this command will load the latest R version:

module load R

If you need a specific version of R, you can search with the command

module spider R

To load a specific version

module load R/4.2.2-fasrc01

For more information on modules, see the Lmod Modules page.

To use R from the command line, you can use an R shell for interactive work. For batch jobs, you can use R CMD BATCH and RScript commands. Note that these commands have different behaviors:

  • R CMD BATCH
    • output will be directed to a .Rout file unless you specify otherwise
    • prints out input statements
    • cannot output to STDOUT
  • RScript
    • output and errors are directed to to STDOUT and STDERR, respectively, as many other programs
    • does not print input statements

For slurm batch examples, refer to FASRC User_Codes Github repository:

Examples and details of how to run R from the command line can be found at:

R Module + RStudio Desktop

RStudio Desktop depends on an R module. Although it has some precompiled R packages that comes with the R module, it is a much more limited list than the RStudio Server app.

RStudio Desktop runs on the host operating system (OS), the same environment as when you ssh to Cannon or FASSE.

This app is particularly useful to run multi-node/mpi applications because the you can specify the exact modules, compilers, and packages that you need to load.

See how to launch RStudio Desktop documentaiton.

R in Jupyter

To use R in Jupyter, you will need to create a conda/mamba virtual environment and install packages jupyter and rpy2 , which will allow you to use R in Jupyter.

Step 1:  Request an interactive job

salloc --partition test --time 02:00:00 --ntasks=1 --mem 10000

Step 2: Load python module, set environmental variables, and create an environment with the necessary packages:

module load python/3.10.13-fasrc01
export PYTHONNOUSERSITE=yes
mamba create -n rpy2_env jupyter numpy matplotlib pandas scikit-learn scipy rpy2 r-ggplot2 -c conda-forge -y

See Python instructions for more details on Python and mamba/conda environments.

After creating the mamba/conda environment, you will need to load that environment by selecting the corresponding kernel on the Jupyter Notebook to start using R in the notebook.

Step 3: Launch the Jupyter app on the OpenOnDemand VDI portal using these instructions.

You may need to load certain modules for package installations. For example, R package lme requires cmake. You can load cmake by adding the module name in the field “Extra Modules”:

Step 4: Open your Jupyter notebook. On the top right corner, click on “Python 3” (typically, it has “Python 3”, but it may be different on your Notebook). Select the created conda environment “Python [conda env:conda-rpy2_env]”:

Alternatively, you can use the top menu: Kernel -> Change Kernel -> Python [conda env:conda-rpy2_env]

Step 5: Install R packages using a Jupyter Notebooks

Refer to the example Jupyter Notebook on FASRC User_Codes Github.

R with Spack

Step 1: Install Spack by following our Spack Install and Setup instructions.

Step 2: Install the R packages with Spack from the command line. For all R package installations with Spack, ensure you are in a compute node by requesting an interactive job (if you are already in a interactive job, there is no need to request another interactive job):

[jharvard@holylogin spack]$ salloc --partition test --time 4:00:00 --mem 16G -c 8
Installing R packages with spack is fairly simple. The main steps are:
[jharvard@holy2c02302 spack]$ spack install package_name  # install software
[jharvard@holy2c02302 spack]$ spack load package_name     # load software to your environment
[jharvard@holy2c02302 spack]$ R                           # launch R
> library(package_name)                                   # load package within R
For specific examples, refer to FASRC User_Codes Github repository:

R Parallel

This is covered at R Parallel here on User Docs, and EXTENSIVELY in User Docs/Parallel_Computing/R.

Troubleshooting

Files that may configure R package installations

  • ~/.Rprofile
  • ~/.Renviron
  • ~/.bashrc
  • ~/.bash_profile
  • ~/.profile
  • ~/.config/rstudio/rstudio-prefs.json
  • ~/.R/Makevars

Examples

We offer a wealth of examples, see R in our User Codes git repository.

References

]]>
27082
Macaulay2 https://docs.rc.fas.harvard.edu/kb/macaulay2/ Wed, 01 May 2024 20:37:38 +0000 https://docs.rc.fas.harvard.edu/?post_type=epkb_post_type_1&p=27049 Description

Macaulay2 is a  algebraic geometry and commutative algebra software. Its creation and development has been funded by the National Science Foundation since 1992.

Macaulay2 on the cluster

Macaulay2 is available on the cluster via Singularity containers. We recommend working on a compute node. You can get to a compute node by requesting an interactive job. For example

salloc --partition test --time 01:00:00 --cpus-per-task 4 --mem-per-cpu 2G

You can pull (i.e. download) a container with the command

singularity pull docker://unlhcc/macaulay2:latest

Start a shell inside the Singularity container

singularity shell macaulay2_latest.sif

The prompt will change to Singularity>. Then, type M2 to start Macualay2. You should see a prompt with i1:

Singularity> M2
Macaulay2, version 1.15
--storing configuration for package FourTiTwo in /n/home01/jharvard/.Macaulay2/init-FourTiTwo.m2
--storing configuration for package Topcom in /n/home01/jharvard/.Macaulay2/init-Topcom.m2
with packages: ConwayPolynomials, Elimination, IntegralClosure, InverseSystems, LLLBases, PrimaryDecomposition, ReesAlgebra, TangentCone,
Truncations

i1 :

For examples, we recommend visiting Macaulay2 documentation.

Resources

]]>
27049
Mathematica https://docs.rc.fas.harvard.edu/kb/mathematica/ Tue, 30 Apr 2024 14:33:09 +0000 https://docs.rc.fas.harvard.edu/?post_type=epkb_post_type_1&p=27043 Description

Mathematica is a powerful computational software system that provides a comprehensive environment for technical computing. Developed by Wolfram Research, it offers a wide range of capabilities spanning symbolic and numerical computation, visualization, and programming. Mathematica’s symbolic engine allows for the manipulation of mathematical expressions, equations, and functions, making it particularly useful for tasks such as calculus, algebra, and symbolic integration. Its vast library of built-in functions covers various areas of mathematics, science, and engineering, enabling users to tackle diverse problems efficiently. Moreover, Mathematica’s interactive interface and high-level programming language facilitate the creation of custom algorithms and applications, making it an indispensable tool for researchers, educators, and professionals in countless fields.

Mathematica is available on the FASRC Cannon cluster as software modules. Currently, the following modules/versions are available:

mathematica/12.1.1-fasrc01 and mathematica/13.3.0-fasrc01

Usage: Command Line

module load mathematica/13.3.0-fasrc01 sbatch run.sbatch

Usage: Interactive GUI via Open On Demand

Within your OOD session, in the terminal type the commands to load the modules and launch Mathematica

[jharvard@holy7c24102 ~]$ module load mathematica
[jharvard@holy7c24102 ~]$ mathematica

You can see different versions of Mathematica in our modules page.

Here is our doc on Open On Demand on Cannon.

Examples

To start using Mathematica on the FASRC cluster, please look at the examples on our Users Code repository.

Resources

]]>
27043
Gaussian https://docs.rc.fas.harvard.edu/kb/gaussian/ Tue, 27 Feb 2024 18:54:06 +0000 https://docs.rc.fas.harvard.edu/?post_type=epkb_post_type_1&p=26828 Access

Please contact us if you require Gaussian access. It is controlled on a case-by-case basis and requires membership in a security group.

When you are not a member of this security group, you can still load the module, but you will not only be able to run Gaussian.

FASRC provides the module and basic instructions on how to launch Gaussian, but we do not provide support specifics on how to run Gaussian. For how to run Gaussian, refer to Gaussian documentation or your department.

Running Gaussian

Example batch file runscript.sh:

#!/bin/bash
#SBATCH -J my_gaussian # job name
#SBATCH -c 1 # number of cores
#SBATCH -t 01:00:00 # time in HH:MM:SS
#SBATCH -p serial_requeue # partition
#SBATCH --mem-per-cpu=800 # memory per core
#SBATCH -o rchelp.out # standard output file
#SBATCH -e rchelp.err # standard error file

module load gaussian/16-fasrc04

g16 CH4_s.gjf

To submit the job:

sbatch runscript.sh

Versions

You can search for gaussian modules with the command module spider gaussian:

[jharvard@boslogin02 ~]$ module spider gaussian

-----------------------------------------------------------------------------------------------------------------------------------------
gaussian:
-----------------------------------------------------------------------------------------------------------------------------------------
Description:
Gaussian, a computational chemistry software program

Versions:
gaussian/16-fasrc01
gaussian/16-fasrc02
gaussian/16-fasrc03
gaussian/16-fasrc04

And to see the details about a particular module, use commands module spider or module display:

[jharvard@boslogin02 ~]$ module spider gaussian/16-fasrc04

-----------------------------------------------------------------------------------------------------------------------------------------
gaussian: gaussian/16-fasrc04
-----------------------------------------------------------------------------------------------------------------------------------------
Description:
Gaussian, a computational chemistry software program

This module can be loaded directly: module load gaussian/16-fasrc04

Help:
gaussian-16-fasrc04
Gaussian, a computational chemistry software program

[jharvard@boslogin02 ~]$ module display gaussian/16-fasrc04
-----------------------------------------------------------------------------------------------------------------------------------------
/n/sw/helmod-rocky8/modulefiles/Core/gaussian/16-fasrc04.lua:
-----------------------------------------------------------------------------------------------------------------------------------------
help([[gaussian-16-fasrc04
Gaussian, a computational chemistry software program
]], [[
]])
whatis("Name: gaussian")
whatis("Version: 16-fasrc04")
whatis("Description: Gaussian, a computational chemistry software program")
setenv("groot","/n/sw/g16_sandybridge")
setenv("GAUSS_ARCHDIR","/n/sw/g16_sandybridge/g16/arch")
setenv("G09BASIS","/n/sw/g16_sandybridge/g16/basis")
setenv("GAUSS_SCRDIR","/scratch")
setenv("GAUSS_EXEDIR","/n/sw/g16_sandybridge/g16/bsd:/n/sw/g16_sandybridge/g16/local:/n/sw/g16_sandybridge/g16/extras:/n/sw/g16_sandybridge/g16")
setenv("GAUSS_LEXEDIR","/n/sw/g16_sandybridge/g16/linda-exe")
prepend_path("PATH","/n/sw/g16_sandybridge/g16/bsd:/n/sw/g16_sandybridge/g16/local:/n/sw/g16_sandybridge/g16/extras:/n/sw/g16_sandybridge/g16")
prepend_path("PATH","/n/sw/g16_sandybridge/nbo6_x64_64/nbo6/bin")

GaussView

RC users can download these clients from our Downloads page. You must be connected to the FASRC VPN to access this page. Your FASRC username and password are required to log in.

On MacOS: Move the downloaded file to the ‘Applications’ folder, unarchive it, and double click on the gview icon located in gaussview16_A03_macOS_64bit.

On Windows: Unarchive the file in the Downloads folder itself.

A pop up will appear saying “Gaussian is not installed”.

Click on OK. This would now open the gview interface.

Troubleshooting

Failed to locate data directory

On your MacOS, if you see a message similar to what is shown on the image here:

you can safely remove the data folder by executing this command on your terminal: ` rm -rf /private/var/<path-to-d/data>`

GaussView doesn’t open

In the case GaussView doesn’t open on MacOS, do the following:

Go to the Applications folder > gaussview16 folder > Right click on gview and choose “Show Package Contents” (see below)

Go to the Contents folder of gview > MacOS folder > Right click on the gview executable and choose “Open”

 

A pop up will appear saying “Gaussian is not installed”. Ignore it and click on OK. This would now open the gview interface.

Note: We do not have license for GaussView on the cluster. It needs to be run locally.  

]]>
26828