Software – FASRC DOCS

KNIME on the FASRC clusters

RC Admin — Wed, 21 May 2025 15:23:14 +0000

Description

KNIME is an open-source data analytics, reporting, and integration platform that is meant to perform various aspects of machine-learning & data mining through its modular data pipelining concept. The platform offers a way to integrate various tasks ranging from developing analytic models to deploying them and sharing insights with your team. The KNIME Analytics Platform offers users 300+ connectors to multiple data sources and integrations to all popular machine learning libraries.

The software’s key capabilities include Data Access & Transformation, Data Analytics, Visualization & Reporting, Statistics & Machine Learning, Generative AI, Collaboration, Governance, Data Apps, Automation, AI Agents.

Given KNIME’s wide scale use and applicability, we have converted it into a system-wide module that can be loaded from anywhere on any of the FASRC clusters, Cannon or FASSE. Additionally, we have packaged it as an app that can be launched using the cluster web interface, Open on Demand (OOD).

KNIME as a module

Knime is available as a module on the FASRC clusters. In order to know more about the module including the versions available and how to load one of them, execute from a terminal on the cluster: module spider knime

This would pull up the information on the versions of KNIME software that are available to load. For example, for a user jharvard on a compute node, the module spider command would produce the following output:

[jharvard@holy8a26602 ~]$ module spider knime/knime:Description:An open-source data analytics, reporting, and integration platform meant to perform various aspects of machine-learning & data mining through its modular data pipelining concept.
Versions:knime/5.4.3-fasrc01knime/5.4.4-fasrc01
For detailed information about a specific "knime" package (including how to load the modules) use the module's full name.
Note that names that have a trailing (E) are extensions provided by other modules.
For example:$ module spider knime/5.4.4-fasrc01

To load a specific module, one can execute: module load knime/5.4.3-fasrc01

Or, to load the default & typically the latest module, one can run: module load knime command. This would result in, e.g.:

[jharvard@holy8a26602 ~]$ module load knime [jharvard@holy8a26602 ~]$ module list Currently Loaded Modules: 1) knime/5.4.4-fasrc01

Once the knime module is loaded, one can launch the GUI by running the knime executable on the terminal provided you ssh into the cluster using X11 forwarding, preferably with the -Y option, and that XQuartz (MacOS) or MobaXterm (Windows) is installed on your local device that is being used to login to the cluster. For example:

ssh -Y jharvard@login.rc.fas.harvard.edu

[jharvard@holylogin05 ~]$ salloc -p test --x11 --time=2:00:00 --mem=4g
[jharvard@holy8a26602 ~]$ module load knime
[jharvard@holy8a26602 ~]$ knime

One can ignore the following libGL errors and should expect to see a GUI appear as shown in the screen shot below.
libGL error: No matching fbConfigs or visuals found libGL error: failed to load driver: swrast

Note: While you can launch KNIME directly on the cluster using X11 forwarding, it is laggy and doesn’t render itself well to faster executions that might be needed for certain KNIME workflows. To avoid issues associated with X11 forwarding, we recommend launching KNIME using OOD.

Both these modules are also available to use via the Knime OOD app, as explained below.

KNIME on OOD

KNIME can be run from Open OnDemand (OOD, formerly known as VDI) by choosing it from the Interactive Apps menu, and specifying your resource needs. Hit Launch, wait for the session to start, and click the “Launch Knime” button.

You can also launch KNIME from the Remote Desktop app on OOD.

Pre-installed Extensions

Both KNIME modules come with the following pre-installed extensions:

For GIS: Geospatial Analytics Extension for KNIME
For Programming:
KNIME Python Integration
KNIME Interactive R Statistics Integration
For Machine Learning:
KNIME H2O Machine Learning Integration
KNIME XGBoost Integration
KNIME Machine Learning Interpretability Extension
For OpenAI, Hugging Face, and other LLMs: KNIME AI Extension
For AI Assistant Coding: KNIME AI Assistant (Labs)
For Google Drive Integration: KNIME Google Connectors

Note: New extensions cannot be installed by the users on the fly as modules don’t come with write permissions.

KNIME Tutorial

The link here takes you to the KNIME tutorial that has been prepared by Lingbo Liu from Harvard’s Center for Geographic Analysis (CGA). This tutorial is best executed by launching the Knime app on OOD.

Extracting compressed .zip or .7z archives with 7-Zip

IQSS RCE — Fri, 16 May 2025 17:28:06 +0000

p7zip/7-Zip is installed on the clusters, making it easy to create compressed/zipped archives or unzip/extract compressed archives.

Use 7-Zip using the command line. If you are using the Open OnDemand interface, you can start a Terminal Emulator window in the Remote Desktop application to extract the contents of a .zip or .7z archive file.

Some examples are included below:

Extracting archives

To list the contents of the file readme_docs.7z:
7z l readme_docs.7z

To extract an archive called readme_docs.7z to a new folder in your current directory called “extracted”, you would type
7z x readme_docs.7z -o./extracted

x is for “eXtract” (this command is useful if you are uncompressing a data source with its own directories – it preserves the directory structure. If you just want to extract everything in the archive without preserving directory structure within it, you can use the e command instead of x)
-o sets the Output directory
. means your current directory
note there is no space between the -o and the path for your extracted content.

To simply extract to the current folder :
7z x readme_docs.7z

Creating and adding to archives

To create an archive file, use the a command to Add files to an archive.
Specify the name of the archive, then the files that should be added to the archive.

This command adds the files doc1.txt and doc2.docx to an archive readme_docs.7z using the 7z compression format (default)
7z a readme_docs.7z doc1.txt doc2.docx

This command adds the contents of the directory “docs” to an archive readme_docs.7z using the 7z compression format (default)
7z a readme_docs.7z docs/

This command adds all files ending in “.txt” in the current directory to an archive readme_docs.7z using the 7z compression format (default)
7z a readme_docs.7z *.txt

This command adds the files doc1.txt and doc2.docx to an archive readme_docs.zip using the ZIP format.
Other options include -t7z (default), -tgzip, -tzip, -tbzip2, -tudf, -ttar.
7z a -tzip readme_docs.zip doc1.txt doc2.docx

While 7-Zip supports archiving using TAR, the tar command is also available. Please see our tips for using tar to archive data.

More information

In Terminal, you can type 7z for a list of commands and switches, or man 7z for detailed descriptions and examples.

Podman

RC Admin — Thu, 23 Jan 2025 15:49:49 +0000

Introduction

Podman is an Open Containers Initiative (OCI) container toolchain developed by RedHat. Unlike its popular OCI cousin Docker, it is daemonless making it easier to use with resource schedulers like Slurm. Podman maintains command line interface (CLI) that is very similar to Docker. On the FASRC cluster the docker command runs podman under the hood and many docker commands just work with podman though with some exceptions. Note that this document uses the term container to mean OCI container. Besides Podman containers FASRC also supports Singularity.

Normally podman requires privileged access. However on the FASRC clusters we have enabled rootless podman, alleviating the requirement. We recommend reading our document on rootless containers before proceeding further so you understand how it works and its limitations.

Podman Documentation

The official Podman Documentation provides the latest information on how to use Podman. On this page we will merely highlight specific useful commands and features/quirks specific to the FASRC cluster. You can get command line help pages by running man podman or podman --help.

Working with Podman

To start working with podman, first get an interactive session either via salloc or via Open OnDemand. Once you have that session then you can start working with your container image. The basic commands we will cover here are:

pull: Download a container image from a container registry
images: List downloaded images
run: Run a command in a new container
build: Create a container image from a Dockerfile/Containerfile
push: push a container image to a container registry

For these examples we will use the lolcow and ubuntu images from DockerHub.

pull

podman pull fetches the specified container image and extracts it into node-local storage (/tmp/container-user- by default on the FASRC cluster). This step is optional, as podman will automatically download an image specified in a podman run, podman build, or podman shell command.

[jharvard@holy8a26601 ~]$ podman pull docker://godlovedc/lolcow
Trying to pull docker.io/godlovedc/lolcow:latest...
Getting image source signatures
Copying blob 8e860504ff1e done | 
Copying blob 9fb6c798fa41 done | 
Copying blob 3b61febd4aef done | 
Copying blob 9d99b9777eb0 done | 
Copying blob d010c8cf75d7 done | 
Copying blob 7fac07fb303e done | 
Copying config 577c1fe8e6 done | 
Writing manifest to image destination
577c1fe8e6d84360932b51767b65567550141af0801ff6d24ad10963e40472c5

images

podman images lists the images that are already available on the node (in /tmp/container-user-)

[jharvard@holy8a26601 ~]$
REPOSITORY                  TAG         IMAGE ID      CREATED      SIZE
docker.io/godlovedc/lolcow  latest      577c1fe8e6d8  7 years ago  248 MB

run

Podman containers may contain an entrypoint script that will execute when the container is run. To run the container:

[jharvard@holy8a26601 ~]$ podman run -it docker://godlovedc/lolcow
_______________________________________
/ Your society will be sought by people \
\ of taste and refinement. /
---------------------------------------
\ ^__^
\ (oo)\_______
(__)\ )\/\
||----w |
|| ||

To view the entrypoint script for a podman container:

[jharvard@holy8a26601 ~]$ podman inspect -f 'Entrypoint: {{.Config.Entrypoint}}\nCommand: {{.Config.Cmd}}' lolcow
Entrypoint: [/bin/sh -c fortune | cowsay | lolcat]
Command: []

shell

To start a shell inside a new container, specify the podman run -it --entrypoint bash options. -it effectively provides an interactive session, while --entrypoint bash invokes the bash shell (bash can be substituted with another shell program that exists in the container image).

[jharvard@holy8a26601 ~]$ podman run -it --entrypoint bash docker://godlovedc/lolcow
root@holy8a26601:/#

GPU Example

First, start an interactive job on a gpu partition. Then invoke podman run with the --device nvidia.com/gpu=all option:

[jharvard@holygpu7c26306 ~]$ podman run --rm --device nvidia.com/gpu=all nvidia/cuda:11.0.3-base-ubuntu20.04 nvidia-smi
Wed Jan 22 15:41:58 2025 
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA A100-SXM4-40GB On | 00000000:CA:00.0 Off | On |
| N/A 27C P0 66W / 400W | N/A | N/A Default |
| | | Enabled |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| MIG devices: |
+------------------+----------------------------------+-----------+-----------------------+
| GPU GI CI MIG | Memory-Usage | Vol| Shared |
| ID ID Dev | BAR1-Usage | SM Unc| CE ENC DEC OFA JPG |
| | | ECC| |
|==================+==================================+===========+=======================|
| 0 2 0 0 | 37MiB / 19968MiB | 42 0 | 3 0 2 0 0 |
| | 0MiB / 32767MiB | | |
+------------------+----------------------------------+-----------+-----------------------+

+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
WARN[0001] Failed to add pause process to systemd sandbox cgroup: dbus: couldn't determine address of session bus

Batch Jobs

Podman containers can also be executed as part of a normal batch job as you would any other command. Simply include the command as part of the sbatch script. As an example here is a sample podman.sbatch:

#!/bin/bash
#SBATCH -J podman_test
#SBATCH -o podman_test.out
#SBATCH -e podman_test.err
#SBATCH -p test
#SBATCH -t 0-00:10
#SBATCH -c 1
#SBATCH --mem=4G

# Podman command line options
podman run docker://godlovedc/lolcow

When submitted to the cluster as a batch job:

[jharvard@holylogin08 ~]$ sbatch podman.sbatch

Generates the podman_test.out which contains:

[jharvard@holylogin08 ~]$ cat podman_test.out
____________________________________
< Don't read everything you believe. >
------------------------------------
\ ^__^
\ (oo)\_______
(__)\ )\/\
||----w |
|| ||

Accessing Files

Each podman container operates within its own isolated filesystem tree in /tmp/container-user-/storage. However, if needed, host file systems can be explicitly shared with containers by using the --volume option when starting a container (this is unlike Singularity which is set up to automatically bind several default filesystems). This option allows you to bind-mount a directory or file from the host into the container, granting the container access to that path. To access files on the host from inside the container, bind host file(s)/directory(ies) into the container using the --volume option. For instance, to access netscratch from the container:

[jharvard@holy8a26602 ~]$ podman run -it --entrypoint bash --volume /n/netscratch:/n/netscratch docker://ubuntu
root@holy8a26602:/# df -h
Filesystem Size Used Avail Use% Mounted on
overlay 397G 6.5G 391G 2% /
tmpfs 64M 0 64M 0% /dev
netscratch-ib01.rc.fas.harvard.edu:/netscratch/C 3.6P 1.8P 1.9P 49% /n/netscratch
/dev/mapper/vg_root-lv_scratch 397G 6.5G 391G 2% /run/secrets
shm 63M 0 63M 0% /dev/shm
devtmpfs 504G 0 504G 0% /dev/tty

Ownership of files as seen from the host that are created by a process in the container depend on the user ID (UID) of the creating process in the container, either:

The host (cluster) user, if the container user is:

root (UID 0) – this is often the default
podman run --userns=keep-id is specified, so the host user and primary group ID are used for the container user (similar to SingularityCE in the default native mode)

podman run --userns=keep-id:uid=,gid= is specified, using the specified UID/GID for the container user and mapping it to the host/cluster user’s UID/GID.E.g., in the following example, the “node” user in the container (UID=1000, GID=1000) creates a file that is (as seen from the host) owned by the host user:

$ podman run -it --rm --user=node --entrypoint=id docker.io/library/node:22
uid=1000(node) gid=1000(node) groups=1000(node)
$ podman run -it --rm --volume /n/netscratch:/n/netscratch --userns=keep-id:uid=1000,gid=1000 --entrypoint=bash docker.io/library/node:22
node@host:/$ touch /n/netscratch/jharvard_lab/Lab/jharvard/myfile
node@host:/$ ls -l /n/netscratch/jharvard_lab/Lab/jharvard/myfile
-rw-r--r--. 1 node node 0 Apr 7 16:05 /n/netscratch/jharvard_lab/Lab/jharvard/myfile
node@host:/$ exit
$ ls -ld /n/netscratch/jharvard_lab/Lab/jharvard/myfile
-rw-r--r--. 1 jharvard jharvard_lab 0 Apr 7 12:05 /n/netscratch/jharvard_lab/Lab/jharvard/myfile

Otherwise the subuid/subgid is associated with the container-uid/container-gid (see rootless containers). Only filesystems that can resolve your subuid’s can be written to from a podman container (e.g. NFS file systems like /n/netscratch and home directories, or node-local filesystems like /scratch or /tmp; but not Lustre filesystems like holylabs) and only locations with “other” read/write/execute permissions can be utilized (e.g. the Everyone directory).

Environment Variables

A Podman container does not inherit environment variables from the host environment. Any environment variables that are not defined by the container image must be explicitly set with the –env option:

[jharvard@holy8a26602 ~]$ podman run -it --rm --env MY_VAR=test python:3.13-alpine python3 -c 'import os; print(os.environ["MY_VAR"])'
test

Building Your Own Podman Container

You can build or import a Podman container in several different ways. Common methods include:

Download an existing OCI container image located in Docker Hub or another OCI container registry (e.g., quay.io, NVIDIA NGC Catalog, GitHub Container Registry).
Build a Podman image from a Containerfile/Dockerfile.

Images are stored by default at /tmp/containers-user-/storage. You can find out more about the specific paths by running the podman info command.

Since the default path is in /tmp that means that containers will only exist for the duration of the job and then the system will clean up the space. If you want to maintain images for longer you will need to override the default configuration. You can do this by putting configuration settings in $HOME/.config/containers/storage.conf. Note that due to subuid you will need to select a storage location that your subuids can access. It should be also noted that the version of NFS the cluster runs does not currently support xattrs meaning that NFS storage mounts will not work. This plus the subuid restrictions mean that the vast majority of network storage will not work for this purpose. Documentation for storage.conf can be found here.

Downloading OCI Container Image From Registry

To download a OCI container image from a registry simply use the pull command:

[jharvard@holy8a26602 ~]$ podman pull docker://godlovedc/lolcow
Trying to pull docker.io/godlovedc/lolcow:latest...
Getting image source signatures
Copying blob 8e860504ff1e done | 
Copying blob 9fb6c798fa41 done | 
Copying blob 3b61febd4aef done | 
Copying blob 9d99b9777eb0 done | 
Copying blob d010c8cf75d7 done | 
Copying blob 7fac07fb303e done | 
Copying config 577c1fe8e6 done | 
Writing manifest to image destination
577c1fe8e6d84360932b51767b65567550141af0801ff6d24ad10963e40472c5
WARN[0006] Failed to add pause process to systemd sandbox cgroup: dbus: couldn't determine address of session bus 
[jharvard@holy8a26602 ~]$ podman image ls
REPOSITORY TAG IMAGE ID CREATED SIZE
docker.io/godlovedc/lolcow latest 577c1fe8e6d8 7 years ago 248 MB

Build Podman Image From Dockerfile/Containerfile

Podman can build container images from a Dockerfile/Containerfile (podman prefers the generic term Containerfile, and podman build without -f will first check for the existence of Containerfile in the current working directory, falling back to Dockerfile if one doesn’t exist). To build first write your Containerfile:

FROM ubuntu:22.04

RUN apt-get -y update \
  && apt-get -y install cowsay lolcat\
  && rm -rf /var/lib/apt/lists/*
ENV LC_ALL=C PATH=/usr/games:$PATH 

ENTRYPOINT ["/bin/sh", "-c", "date | cowsay | lolcat"]

Then run the build command (assuming Dockerfile or Containerfile in the current working directory):

[jharvard@holy8a26602 ~]$ podman build -t localhost/lolcow
STEP 1/4: FROM ubuntu:22.04
Resolved "ubuntu" as an alias (/etc/containers/registries.conf.d/000-shortnames.conf)
Trying to pull docker.io/library/ubuntu:22.04...
Getting image source signatures
Copying blob 6414378b6477 done | 
Copying config 97271d29cb done | 
Writing manifest to image destination
STEP 2/4: RUN apt-get -y update && apt-get -y install cowsay lolcat

... omitted output ...

Running hooks in /etc/ca-certificates/update.d...
done.
--> a41765f5337a
STEP 3/4: ENV LC_ALL=C PATH=/usr/games:$PATH
--> e9eead916e20
STEP 4/4: ENTRYPOINT ["/bin/sh", "-c", "date | cowsay | lolcat"]
COMMIT
--> 51e919dd571f
51e919dd571f1c8a760ef54c746dcb190659bdd353cbdaa1d261ba8d50694d24

Saving/loading a container image on another compute node

Podman container images are stored on a node-local filesystem (/tmp/container-user-). Any container images built / pulled on one node that are needed on another node must be saved to a data storage location that is accessible to all compute nodes in the FAS RC cluster. The podman save command can be used to accomplish this.

[jharvard@holy8a26602 ~]$ podman save --format oci-archive -o lolcow.tar localhost/lolcow
[jharvard@holy8a26602 ~]$ ls -lh lolcow.tar 
-rw-r--r--. 1 jharvard jharvard_lab 57M Jan 27 11:37 lolcow.tar

Note: omitting --format oci-archive saves the file in the docker-archive format, which is uncompressed, and thus faster to save/load though larger in size.

From another compute node, podman load extracts the docker- or oci-archive to the node-local /tmp/container-user-, where it can be used by podman:

[jharvard@holy8a26603 ~]$ podman images
REPOSITORY  TAG         IMAGE ID    CREATED     SIZE
[jharvard@holy8a26603 ~]$ podman load -i lolcow.tar
Getting image source signatures
Copying blob 163070f105c3 done   | 
Copying blob f88085971e43 done   | 
Copying config e9749e43bc done   | 
Writing manifest to image destination
Loaded image: localhost/lolcow:latest
[jharvard@holy8a26603 ~]$ podman images
REPOSITORY        TAG         IMAGE ID      CREATED        SIZE
localhost/lolcow  latest      e9749e43bc74  6 minutes ago  172 MB

Pushing a container image to a container registry

To make a container image built on the FASRC cluster available outside the FASRC cluster, the container image can be pushed to a container registry. Popular container registries with a free tier include Docker Hub and the GitHub Container Registry.

This example illustrates the use of the GitHub Container Registry, and assumes a GitHub account.

Note: The GitHub Container Registry is a part of the GitHub Packages ecosystem

Create a Personal access token (classic) with write:packages scope (this implicitly adds read:packages for pulling private container images):
https://github.com/settings/tokens/new?scopes=write:packages
Authenticate to ghcr.io, using the authentication token generated in step 1 as the “password” (replace “
” with your GitHub username):
```
[jharvard@holy8a26603 ~]$ podman login -u  ghcr.io
Password: 
Login succeeded!
```
Ensure the image has been named ghcr.io//: (where “
” is either your GitHub username, or an organization that you are a member of and have permission to
using the podman tag command to add a name to an existing local image if it needed (Note: the GitHub owner must be all lower-case (e.g., jharvard instead of JHarvard)):
```
[jharvard@holy8a26603 ~]$ podman tag localhost/lolcow:latest ghcr.io//lolcow:latest
```
Push the image to the container registry:

[jharvard@holy8a26603 ~]$ podman push ghcr.io//lolcow:latest
Getting image source signatures
Copying blob 2573e0d81582 done   |
… 
Writing manifest to image destination

By default, the container image will be private. To change the visibility to “public”, access the package from the list at https://github.com/GITHUB_OWNER?tab=packages and configure the package settings (see Configuring a package’s access control and visibility).

Online Trainings Materials

FASRC Training Materials

References

Containers

RC Admin — Fri, 10 Jan 2025 20:28:52 +0000

Introduction

Containers have become the industry standard method for managing complex software environments, especially ones with bespoke configuration options. In brief, a container is a self-contained environment and software stack that runs on the host operating system (OS). They can allow users to use a variety of base operating systems (e.g., Ubuntu) and their software packages aside from the host OS the cluster nodes run (Rocky Linux). One can even impersonate root inside the container to allow for highly customized builds and a high level of control of the environment.

While containers allow for the management of sophisticated software stacks, they are not a panacea. As light as containers are, they still create a performance penalty for use, as the more layers you put between the code and the hardware, the more inefficiencies pile up. In addition, host filesystem access and various other permission issues can be tricky. Other incompatibilities can arise between the OS of the container and the OS of the compute node.

Still, with these provisos in mind, containers are an excellent tool for software management. Containers exist for many software packages, making software installation faster and more trouble-free. Containers also make it easy to record and share the exact stack of software required for a workflow, allowing other researchers to more-easily reproduce research results in order to validate and extend them.

Types of Containers

There are two main types of containers. The first is the industry standard OCI (Open Container Initiative) container, popularized by Docker. Docker uses a client-server architecture, with one (usually) privileged background server process (or “daemon” process, called “dockerd”) per host. If run on a multi-tenant system (e.g., HPC cluster such as Cannon), this results in a security issue in that users who interact with the privileged daemon process could access files owned by other users. Additionally, on an HPC cluster, the docker daemon process does not integrate with Slurm resource allocation facilities.

Podman, a daemonless OCI container toolchain developed by RedHat to address these issues, is installed on the FASRC cluster. The Podman CLI (command-line interface) was designed to be largely compatible with the Docker CLI, and on the FASRC cluster, the docker command runs podman under the hood. Many docker commands will just work with podman, though there are some differences.

The second is Singularity. Singularity grew out of the need for additional security in shared user contexts (like you find on a cluster). Since Docker normally requires the user to run as root, Singularity was created to alleviate this requirement and bring the advantages of containerization to a broader context. There are a couple of implementations of Singularity, and on the cluster, we use SingularityCE (the other implementation is Apptainer). Singularity has the ability to convert OCI (docker) images into Singularity Image Format (SIF) files. Singularity images have the advantage of being distributable as a single read-only file; on an HPC cluster, this can be located on a shared filesystem, which can be easily launched by processes on different nodes. Additionally, Singularity containers can run as the user who launched them without elevated privileges.

Rootless

Normally, building a container requires root permissions, and in the case of Podman/Docker, the containers themselves would ordinarily be launched by the root user. While this may be fine in a cloud context, it is not in a shared resource context like a HPC cluster. Rootless is the solution to this problem.

Rootless essentially allows the user to spoof being root inside the container. It does this via a Linux feature called subuid (short for Subordinate User ID) and subgid (Subordinate Group ID). This feature allows a range of uid’s (a unique integer assigned to each user name used for permissions identification) and gid’s (unique integer for groups) to be subordinated to another uid. An example is illustrative. Let’s say you are userA with a uid of 20000. You are assigned the subuid range of 1020001-1021000. When you run your container, the following mapping happens:

In the Container [username(uid)]	Outside the Container [username(uid)]
root(0)	userA(20000)
apache(48)	1020048
ubuntu(1000)	1021000

Thus, you can see that while you are inside the container, you pretend to be another user and have all the privileges of that user in the container. Outside the container, though, you are acting as your user, and the uid’s subordinated to your user.

A few notes are important here:

The subuid/subgid range assigned to each user does not overlap the uid/gid or subuid/subgid range assigned to any other user or group.
While you may be spoofing a specific user inside of the container, the process outside the container sees you as your normal uid or subuid. Thus, if you use normal Linux tools like top or ps outside the container, you will notice that the id’s that show up are your uid and subuid.
Filesystems, since they are external, also see you as your normal uid/gid and subuid/subgid. So files created as root in the container will show up on the storage as owned by your uid/gid. Files created by other users in the container will show up as their mapped subuid/subgid.

Rootless is very powerful and allows you to both build containers on the cluster, as well as running Podman/Docker containers right out of the box. If you want to see what your subuid mapping is, you can find the mappings at /etc/subuid and /etc/subgid. You can find your uid by running the id command, which you can then use to look up your map (e.g., with the command: grep "^$(id -u):" /etc/subuid).

Rootless and Filesystems

Two more crucial notes about filesystems. The first is that since subuids are not part of our normal authentication system, it means that filesystems that cannot resolve subids will not permit them access. In particular Lustre (e.g., /n/holylabs) does not recognize subuids and since it cannot resolve them, it will not permit them. NFS filesystems (e.g., /n/netscratch) do not have this problem.

The second is that even if you can get into the filesystem, you may not be able to traverse into locations that do not have world access (o+rx) enabled. This is because the filesystem cannot resolve your user group or user name, does not see you as a valid member of the group, and thus will reject you. As such, it is imperative to test and validate filesystem access for filesystems you intend to map into the container and ensure that access is achievable. A simple way to ensure this is to utilize the Everyone directory which exists for most filesystems on the cluster. Note that your home directory is not world accessible for security reasons and thus cannot be used.

Getting Started

The first step in utilizing a container on the cluster is to submit a job. Login nodes are not appropriate places for development. If you are just beginning, the easiest method is to either get a command line interactive session via salloc, or launch an OOD session.

Once you have a session, you can then launch your container:

Singularity

[jharvard@holy8a26602 ~]$ singularity run docker://godlovedc/lolcow
INFO: Downloading library image to tmp cache: /scratch/sbuild-tmp-cache-701047440
INFO: Converting OCI blobs to SIF format
INFO: Starting build...
INFO: Fetching OCI image...
45.3MiB / 45.3MiB [============================================================================================================================] 100 % 21.5 MiB/s 0s
53.7MiB / 53.7MiB [============================================================================================================================] 100 % 21.5 MiB/s 0s
INFO: Extracting OCI image...
2025/01/09 10:49:52 warn rootless{dev/agpgart} creating empty file in place of device 10:175
2025/01/09 10:49:52 warn rootless{dev/audio} creating empty file in place of device 14:4
2025/01/09 10:49:52 warn rootless{dev/audio1} creating empty file in place of device 14:20
INFO: Inserting Singularity configuration...
INFO: Creating SIF file...
_________________________________________
/ Q: What do you call a principal female \
| opera singer whose high C |
| |
| is lower than those of other principal |
\ female opera singers? A: A deep C diva. /
-----------------------------------------
\ ^__^
\ (oo)\_______
(__)\ )\/\
||----w |
|| ||

Podman

[jharvard@holy8a24601 ~]$ podman run docker://godlovedc/lolcow
Trying to pull docker.io/godlovedc/lolcow:latest...
Getting image source signatures
Copying blob 8e860504ff1e done |
Copying blob 9fb6c798fa41 done |
Copying blob 3b61febd4aef done |
Copying blob 9d99b9777eb0 done |
Copying blob d010c8cf75d7 done |
Copying blob 7fac07fb303e done |
Copying config 577c1fe8e6 done |
Writing manifest to image destination
_____________________________
< Give him an evasive answer. >
-----------------------------
\ ^__^
\ (oo)\_______
(__)\ )\/\
||----w |
|| ||

Shell

If you want to get a shell prompt in a container do:

Singularity

[jharvard@holy8a26602 ~]$ singularity shell docker://godlovedc/lolcow
Singularity>

Podman

[jharvard@holy8a26601 ~]$ podman run --rm -it --entrypoint bash docker://godlovedc/lolcow
root@holy8a26601:/#

GPU

If you want to use a GPU in a container first start a job reserving a GPU on a gpu node. Then do the following:

Singularity

You will want to add the --nv flag for singularity:

[jharvard@holygpu7c26306 ~]$ singularity exec --nv docker://godlovedc/lolcow /bin/bash
Singularity> nvidia-smi
Fri Jan 10 15:50:20 2025 
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA A100-SXM4-40GB On | 00000000:4B:00.0 Off | On |
| N/A 24C P0 43W / 400W | 74MiB / 40960MiB | N/A Default |
| | | Enabled |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| MIG devices: |
+------------------+----------------------------------+-----------+-----------------------+
| GPU GI CI MIG | Memory-Usage | Vol| Shared |
| ID ID Dev | BAR1-Usage | SM Unc| CE ENC DEC OFA JPG |
| | | ECC| |
|==================+==================================+===========+=======================|
| 0 1 0 0 | 37MiB / 19968MiB | 42 0 | 3 0 2 0 0 |
| | 0MiB / 32767MiB | | |
+------------------+----------------------------------+-----------+-----------------------+

+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
Singularity>

Podman

For podman you need to add --device nvidia.com/gpu=all:

[jharvard@holygpu7c26305 ~]$ podman run --rm --device nvidia.com/gpu=all nvidia/cuda:11.0.3-base-ubuntu20.04 nvidia-smi
Fri Jan 10 20:26:57 2025 
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA A100-SXM4-40GB On | 00000000:31:00.0 Off | On |
| N/A 25C P0 47W / 400W | N/A | N/A Default |
| | | Enabled |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| MIG devices: |
+------------------+----------------------------------+-----------+-----------------------+
| GPU GI CI MIG | Memory-Usage | Vol| Shared |
| ID ID Dev | BAR1-Usage | SM Unc| CE ENC DEC OFA JPG |
| | | ECC| |
|==================+==================================+===========+=======================|
| 0 2 0 0 | 37MiB / 19968MiB | 42 0 | 3 0 2 0 0 |
| | 0MiB / 32767MiB | | |
+------------------+----------------------------------+-----------+-----------------------+

+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
WARN[0005] Failed to add pause process to systemd sandbox cgroup: dbus: couldn't determine address of session bus

Docker Rate Limiting

Docker Hub limits the number of pulls anonymous accounts can make. If you hit either an error of:

ERROR: toomanyrequests: Too Many Requests.

You have reached your pull rate limit. You may increase the limit by authenticating and upgrading: https://www.docker.com/increase-rate-limits.

you will need to create a Docker account to increase your limit. See the Docker documentation for more details.

Once you have a Docker account, you can authenticate with Docker Hub with your Docker Hub account (not FASRC account) and then run a Docker container.

Singularity

singularity remote login --username <dockerhub_username> docker://docker.io

Podman

podman login docker.io

Advanced Usage

For advanced usage tips such as how to build your own containers, see our specific container software pages:

Python Package Installation

RC Admin — Wed, 04 Sep 2024 17:02:13 +0000

Description

Python packages on the cluster are primarily managed with Mamba. Direct use of pip, outside of a virtual environment, is discouraged on the FASRC clusters.

Mamba is a package manager that is a drop-in replacement for Conda, and is generally faster and better at resolving dependencies:

Speed: Mamba is written in C++, which makes it faster than Conda. Mamba uses parallel processing and efficient code to install packages faster.
Compatibility: Mamba is fully compatible with Conda, so it can use the same commands, packages, and environment configurations.
Cross-platform support: Mamba works on Mac, Linux and Windows.
Dependency resolution: Mamba is better at resolving dependencies than Conda.
Environment creation: Mamba is faster at creating environments, especially large ones.
Package repository: Mamba uses Mambaforge ( aka conda-forge ), the most up to date packages available.

Important:
Anaconda is currently reviewing its Terms of Service for Academia and Research and is expected to conclude the update by the end of 2024. There is a possibility that Conda may no longer be free for non-profit academic research use at institutions with more than 200 employees. And downloading packages through Anaconda’s Main channel may incur costs. Hence, we recommend our users switch to using open-source conda-forge channel for package distribution when possible. Our python module is built with Miniforge3 distribution that has conda-forge set as its default channel.

Mamba is a drop-in replacement for Conda and uses the same commands and configuration options as conda. You can swap almost all commands between conda & mamba. By default, mamba uses conda-forge, the free Mambaforge package repository. ( In this doc, we will generally only refer to mamba.)

Usage

mamba is available on the FASRC cluster as a software module either as Mambaforge or as python/3* which is aliased to mamba. Once can access this by loading either of the following modules:

$ module load python/{PYTHON_VERS}-fasrc01
$ python -V Python {PYTHON_VERS}

Environments

You can create a virtual environments with mamba in the same way as with conda. However, it is important to start an interactive session prior to creating an environment and installing desired packages in the following manner:

$ salloc --partition test --nodes=1 --cpus-per-task=2 --mem=4GB --time=0-02:00:00

$ module load python/{PYTHON_VERS}-fasrc01

You don’t need to export these two variables before creating your mamba environment setting the package location to the standard place.

export CONDA_PKGS_DIRS=~/conda/pkgs
export CONDA_ENVS_PATH=~/conda/envs

However, If you need to locate the packages elsewhere, like a shared directory, then specify the absolute file path.

export CONDA_PKGS_DIRS=//conda/pkgs
export CONDA_ENVS_PATH=//conda/envs

Create an environment using mamba: $ mamba create -n

You can also install packages with the create command that could speed up your setup time significantly. For example,

$ mamba create -n   
$ mamba create -n python_env1 python={PYTHON_VERS} pip wheel

You must activate an environment in order to use it or install any packages in it. To activate and use an environment: $ mamba activate python_env1

To deactivate an active environment: $ mamba deactivate

You can list the packages currently installed in the mamba or conda environment with: $ mamba list

To install new packages in the environment with mamba using the default channel:

$ mamba install -y

For example: $ mamba install -y numpy

To install a package from a specific channel, instead:

$ mamba install --channel -y

For example: $ mamba install --channel conda-forge boltons

To uninstall packages: $ mamba uninstall PACKAGE

To delete an environment: $ conda remove -n --all -y

For additional features, please refer to the Mamba documentation.

Pip Installs

Avoid using pip outside of a mamba environment on any FASRC cluster. If you run pip install outside of a mamba environment, the installed packages will be placed in your $HOME/.local directory, which can lead to package conflicts and may cause some packages to fail to install or load correctly via mamba.

For example, if your environment name is python_env1:

$ module load python
$ mamba activate python_env1
$ pip install

Best Practices

Use `mamba` environment in Jupyter Notebooks

If you would like to use a mamba environment as a kernel in a Jupyter Notebook on Open OnDemand (Cannon OOD or FASSE OOD), you have to install packages, ipykernel and nb_conda_kernels. These packages will allow Jupyter to detect mamba environments that you created from the command line.

For example, if your environment name is python_env1:

$ module load python
$ mamba activate python_env1
$ mamba install ipykernel nb_conda_kernels

After these packages are installed, launch a new Jupyter Notebook job (existing Jupyter Notebook jobs will fail to “see” this environment). Then:

Open a Jupyter Notebook (a .ipynb file)
On the top menu, click Kernel -> Change kernel -> select the conda environment

Mamba environments in a desired location

With mamba, use the -p or --prefix option to specify writing environment files to a desired location, such as the holylabs location. Don’t use your home directory as it has very low performance due to filesystem latency. Using a lab share location, you can also share your conda environment with other people on the cluster. Keep in mind, you will need to make the destination directory, and specify the python version to use. For example:

$ mamba create --prefix /n/holylabs/LABS/{YOUR_LAB}/Lab/envs python={PYTHON_VERS}

$ mamba activate /n/holylabs/LABS/{YOUR_LAB}/Lab/envs

To delete an environment at that desired location: $ conda remove -p /n/holylabs/LABS/{YOUR_LAB}/Lab/envs --all -y

Troubleshooting

Interactive vs. batch jobs

If your code works in an interactive job, but fails in a slurm batch job,

You are submitting your jobs from within a mamba environment.
Solution 1: Deactivate your environment with the command mamba deactivate and submit the job or
Solution 2: Open another terminal and submit the job from outside the environment.
Check if your ~/.bashrc or ~/.bash_profile files have a section of conda initialize or a source activate command. The conda initialize section is known to create issues on the FASRC clusters.
Solution: Delete the section between the two conda initialize statements. If you have source activate in those files, delete it or comment it out.
For more information on ~/.bashrc files, see https://docs.rc.fas.harvard.edu/kb/editing-your-bashrc/

Jupyter Notebook or JupyterLab on Open OnDemand/VDI problems

See Jupyter Notebook or JupyterLab on Open OnDemand/VDI troubleshooting section.

R and RStudio on the FASRC clusters

RC Admin — Fri, 07 Jun 2024 20:46:42 +0000

What is R?

R is a language and environment for statistical computing and graphics. It is a GNU project which is similar to the S language and environment which was developed at Bell Laboratories (formerly AT&T, now Lucent Technologies) by John Chambers and colleagues. R can be considered as a different implementation of S. There are some important differences, but much code written for S runs unaltered under R.

There are several options to use R on the FASRC clusters:

FASRC recommended method: RStudio Server stand-alone app on Open OnDemand
Advanced users:
- R module from command line interface
- R module + RStudio Desktop application on Remote Desktop app on Open onDemand
- R in Jupyter
- R with Spack

We recommend using RStudio Server on Open OnDemand because it is the simplest way to install R packages (see RStudio Server). We only recommend R module and RStudio Desktop if:

plan to run mpi/multi-node jobs
need to choose specific compilers for R package installation
you are an experienced user and know how to install software

RStudio Server

RStudio Server is our go-to RStudio app because it contains a wide range of precompiled R packages from bioconductor and rocker/tidyverse. This means that installing R packages in RStudio Server is pretty straightforward. Most times, it will be sufficient to simply:

> install.packages("package_name")

This simplicity was possible because RStudio Server runs inside a Singularity container, meaning that it does not use the host operating system (OS). For more information on Singularity, refer to our Singularity on the cluster docs.

Important notes:

User-installed R libraries will be installed in ~/R/ifxrstudio/\
This app contains many pre-compiled packages from bioconductor and rocker/tidyverse.
FAS RC environment modules (e.g. module load) and Slurm (e.g. sbatch) are not accessible from this app.
For the RStudio with environment module and Slurm support, see RStudio Desktop

This app is useful for most applications, including multi-core jobs. However, it is not suitable for multi-node jobs. For multi-node jobs, the recommended app is RStudio Desktop.

FASSE cluster additional settings

If you are using FASSE Open OnDemand and need to install R packages in RStudio Server, you will likely need to set the proxies as explained in our Proxy Settings documentation. Before installing packages, execute these two commands in RStudio Server:

> Sys.setenv(http_proxy="http://rcproxy.rc.fas.harvard.edu:3128")
> Sys.setenv(https_proxy="http://rcproxy.rc.fas.harvard.edu:3128")

Package Seurat

In RStudio Server Release 3.18, the default version for umap-learn is 0.5.5. However, this version contains a bug. To resolve this issue, downgrade to umap-learn version 0.5.4:

> install.packages("Seurat")
> reticulate::py_install(packages = c("umap-learn==0.5.4","numpy<2"))

And test with

> library(Seurat)
> data("pbmc_small")
> pbmc_small <- RunUMAP(object = pbmc_small, dims = 1:5, metric='correlation', umap.method='umap-learn')
UMAP(angular_rp_forest=True, local_connectivity=1, metric='correlation', min_dist=0.3, n_neighbors=30, random_state=RandomState(MT19937) at 0x14F205B9E240, verbose=True)
Wed Jul 3 17:22:55 2024 Construct fuzzy simplicial set
Wed Jul 3 17:22:56 2024 Finding Nearest Neighbors
Wed Jul 3 17:22:58 2024 Finished Nearest Neighbor Search
Wed Jul 3 17:23:00 2024 Construct embedding
Epochs completed: 100%| ██████████ 500/500 [00:00]
Wed Jul 3 17:23:01 2024 Finished embedding

R, CRAN, and RStudio Server pinned versions

To ensure R packages compatibility, R, CRAN, and RStudio Server versions are pinned to a specific date. For more details see Rocker project which is the base image for FASRC’s RStudio Server.

Use R packages from RStudio Server in a batch job

The RStudio Server OOD app hosted on Cannon at rcood.rc.fas.harvard.edu and FASSE at fasseood.rc.fas.harvard.edu runs RStudio Server in a Singularity container (see Singularity on the cluster). The path to the Singularity image on both Cannon and FASSE clusters is the same:

/n/singularity_images/informatics/ifxrstudio/ifxrstudio:RELEASE_.sif

Where corresponds to the Bioconductor version listed in the “R version” dropdown menu. For example:

R 4.2.3 (Bioconductor 3.16, RStudio 2023.03.0)

uses the Singularity image:

/n/singularity_images/informatics/ifxrstudio/ifxrstudio:RELEASE_3_16.sif

As mentioned above, when using the RStudio Server OOD app, user-installed R packages by default go in:

~/R/ifxrstudio/RELEASE_

This is an example of a batch script named runscript.sh that executes R script myscript.R inside the Singularity container RELEASE_3_16:

#!/bin/bash
#SBATCH -c 1 # Number of cores (-c)
#SBATCH -t 0-01:00 # Runtime in D-HH:MM
#SBATCH -p test # Partition to submit to
#SBATCH --mem=1G # Memory pool for all cores (see also --mem-per-cpu)
#SBATCH -o myoutput_%j.out # File to which STDOUT will be written, %j inserts jobid
#SBATCH -e myerrors_%j.err # File to which STDERR will be written, %j inserts jobid

# set R packages and rstudio server singularity image locations
my_packages=${HOME}/R/ifxrstudio/RELEASE_3_16
rstudio_singularity_image="/n/singularity_images/informatics/ifxrstudio/ifxrstudio:RELEASE_3_16.sif"

# run myscript.R using RStudio Server signularity image
singularity exec --cleanenv --env R_LIBS_USER=${my_packages} ${rstudio_singularity_image} Rscript myscript.R

To submit the job, execute the command:

sbatch runscript.sh

Advanced Users

These options are for users familiar with software installation from source, where you choose compilers and set your environmental variables. If you are not familiar with these concepts, we highly recommend using RStudio Server instead.

R module

To use R module, ou should first have taken our Introduction to the FASRC training and be familiar with running jobs on the cluster. R modules come with some basic R packages. If you use a module, you will likely have to install most of the R packages that you need.

To use R on the FASRC clusters, load R via our module system. For example, this command will load the latest R version:

module load R

If you need a specific version of R, you can search with the command

module spider R

To load a specific version

module load R/4.2.2-fasrc01

For more information on modules, see the Lmod Modules page.

To use R from the command line, you can use an R shell for interactive work. For batch jobs, you can use R CMD BATCH and RScript commands. Note that these commands have different behaviors:

R CMD BATCH
- output will be directed to a .Rout file unless you specify otherwise
- prints out input statements
- cannot output to STDOUT
RScript
- output and errors are directed to to STDOUT and STDERR, respectively, as many other programs
- does not print input statements

For slurm batch examples, refer to FASRC User_Codes Github repository:

R CMD BATCH example
RScript example

Examples and details of how to run R from the command line can be found at:

Stack Overflow post
R doc pages
O’Reilly Books Online for Harvard (valid Harvard ID required) and then search for “R programming” or “R cookbook”

R Module + RStudio Desktop

RStudio Desktop depends on an R module. Although it has some precompiled R packages that comes with the R module, it is a much more limited list than the RStudio Server app.

RStudio Desktop runs on the host operating system (OS), the same environment as when you ssh to Cannon or FASSE.

This app is particularly useful to run multi-node/mpi applications because the you can specify the exact modules, compilers, and packages that you need to load.

See how to launch RStudio Desktop documentaiton.

R in Jupyter

To use R in Jupyter, you will need to create a conda/mamba virtual environment and install packages jupyter and rpy2 , which will allow you to use R in Jupyter.

Step 1: Request an interactive job

salloc --partition test --time 02:00:00 --ntasks=1 --mem 10000

Step 2: Load python module, set environmental variables, and create an environment with the necessary packages:

module load python/3.10.13-fasrc01
export PYTHONNOUSERSITE=yes
mamba create -n rpy2_env jupyter numpy matplotlib pandas scikit-learn scipy rpy2 r-ggplot2 -c conda-forge -y

See Python instructions for more details on Python and mamba/conda environments.

After creating the mamba/conda environment, you will need to load that environment by selecting the corresponding kernel on the Jupyter Notebook to start using R in the notebook.

Step 3: Launch the Jupyter app on the OpenOnDemand VDI portal using these instructions.

You may need to load certain modules for package installations. For example, R package lme requires cmake. You can load cmake by adding the module name in the field “Extra Modules”:

Step 4: Open your Jupyter notebook. On the top right corner, click on “Python 3” (typically, it has “Python 3”, but it may be different on your Notebook). Select the created conda environment “Python [conda env:conda-rpy2_env]”:

Alternatively, you can use the top menu: Kernel -> Change Kernel -> Python [conda env:conda-rpy2_env]

Step 5: Install R packages using a Jupyter Notebooks

Refer to the example Jupyter Notebook on FASRC User_Codes Github.

R with Spack

Step 1: Install Spack by following our Spack Install and Setup instructions.

Step 2: Install the R packages with Spack from the command line. For all R package installations with Spack, ensure you are in a compute node by requesting an interactive job (if you are already in a interactive job, there is no need to request another interactive job):

[jharvard@holylogin spack]$ salloc --partition test --time 4:00:00 --mem 16G -c 8

Installing R packages with spack is fairly simple. The main steps are:

[jharvard@holy2c02302 spack]$ spack install package_name  # install software
[jharvard@holy2c02302 spack]$ spack load package_name     # load software to your environment
[jharvard@holy2c02302 spack]$ R                           # launch R
> library(package_name)                                   # load package within R

For specific examples, refer to FASRC User_Codes Github repository:

R and RStudio on Windows

See our R and RStudio on Windows page.

Troubleshooting

Files that may configure R package installations

~/.Rprofile
~/.Renviron
~/.bashrc
~/.bash_profile
~/.profile
~/.config/rstudio/rstudio-prefs.json
~/.R/Makevars

References

Macaulay2

RC Admin — Wed, 01 May 2024 20:37:38 +0000

Description

Macaulay2 is a algebraic geometry and commutative algebra software. Its creation and development has been funded by the National Science Foundation since 1992.

Macaulay2 on the cluster

Macaulay2 is available on the cluster via Singularity containers. We recommend working on a compute node. You can get to a compute node by requesting an interactive job. For example

salloc --partition test --time 01:00:00 --cpus-per-task 4 --mem-per-cpu 2G

You can pull (i.e. download) a container with the command

singularity pull docker://unlhcc/macaulay2:latest

Start a shell inside the Singularity container

singularity shell macaulay2_latest.sif

The prompt will change to Singularity>. Then, type M2 to start Macualay2. You should see a prompt with i1:

Singularity> M2
Macaulay2, version 1.15
--storing configuration for package FourTiTwo in /n/home01/jharvard/.Macaulay2/init-FourTiTwo.m2
--storing configuration for package Topcom in /n/home01/jharvard/.Macaulay2/init-Topcom.m2
with packages: ConwayPolynomials, Elimination, IntegralClosure, InverseSystems, LLLBases, PrimaryDecomposition, ReesAlgebra, TangentCone,
Truncations

i1 :

For examples, we recommend visiting Macaulay2 documentation.

Resources

Mathematica

RC Admin — Tue, 30 Apr 2024 14:33:09 +0000

Description

Mathematica is a powerful computational software system that provides a comprehensive environment for technical computing. Developed by Wolfram Research, it offers a wide range of capabilities spanning symbolic and numerical computation, visualization, and programming. Mathematica’s symbolic engine allows for the manipulation of mathematical expressions, equations, and functions, making it particularly useful for tasks such as calculus, algebra, and symbolic integration. Its vast library of built-in functions covers various areas of mathematics, science, and engineering, enabling users to tackle diverse problems efficiently. Moreover, Mathematica’s interactive interface and high-level programming language facilitate the creation of custom algorithms and applications, making it an indispensable tool for researchers, educators, and professionals in countless fields.

Mathematica is available on the FASRC Cannon cluster as software modules. Currently, the following modules/versions are available:

mathematica/12.1.1-fasrc01 and mathematica/13.3.0-fasrc01

Examples

To start using Mathematica on the FASRC cluster, please look at the examples on our Users Code repository.

Resources

Gaussian

RC Admin — Tue, 27 Feb 2024 18:54:06 +0000

Access

Please contact us if you require Gaussian access. It is controlled on a case-by-case basis and requires membership in a security group.

When you are not a member of this security group, you can still load the module, but you will not only be able to run Gaussian.

FASRC provides the module and basic instructions on how to launch Gaussian, but we do not provide support specifics on how to run Gaussian. For how to run Gaussian, refer to Gaussian documentation or your department.

Running Gaussian

Example batch file runscript.sh:

#!/bin/bash
#SBATCH -J my_gaussian # job name
#SBATCH -c 1 # number of cores
#SBATCH -t 01:00:00 # time in HH:MM:SS
#SBATCH -p serial_requeue # partition
#SBATCH --mem-per-cpu=800 # memory per core
#SBATCH -o rchelp.out # standard output file
#SBATCH -e rchelp.err # standard error file

module load gaussian/16-fasrc04

g16 CH4_s.gjf

To submit the job:

sbatch runscript.sh

Versions

You can search for gaussian modules with the command module spider gaussian:

[jharvard@boslogin02 ~]$ module spider gaussian

-----------------------------------------------------------------------------------------------------------------------------------------
gaussian:
-----------------------------------------------------------------------------------------------------------------------------------------
Description:
Gaussian, a computational chemistry software program

Versions:
gaussian/16-fasrc01
gaussian/16-fasrc02
gaussian/16-fasrc03
gaussian/16-fasrc04

And to see the details about a particular module, use commands module spider or module display:

[jharvard@boslogin02 ~]$ module spider gaussian/16-fasrc04

-----------------------------------------------------------------------------------------------------------------------------------------
gaussian: gaussian/16-fasrc04
-----------------------------------------------------------------------------------------------------------------------------------------
Description:
Gaussian, a computational chemistry software program

This module can be loaded directly: module load gaussian/16-fasrc04

Help:
gaussian-16-fasrc04
Gaussian, a computational chemistry software program

[jharvard@boslogin02 ~]$ module display gaussian/16-fasrc04
-----------------------------------------------------------------------------------------------------------------------------------------
/n/sw/helmod-rocky8/modulefiles/Core/gaussian/16-fasrc04.lua:
-----------------------------------------------------------------------------------------------------------------------------------------
help([[gaussian-16-fasrc04
Gaussian, a computational chemistry software program
]], [[
]])
whatis("Name: gaussian")
whatis("Version: 16-fasrc04")
whatis("Description: Gaussian, a computational chemistry software program")
setenv("groot","/n/sw/g16_sandybridge")
setenv("GAUSS_ARCHDIR","/n/sw/g16_sandybridge/g16/arch")
setenv("G09BASIS","/n/sw/g16_sandybridge/g16/basis")
setenv("GAUSS_SCRDIR","/scratch")
setenv("GAUSS_EXEDIR","/n/sw/g16_sandybridge/g16/bsd:/n/sw/g16_sandybridge/g16/local:/n/sw/g16_sandybridge/g16/extras:/n/sw/g16_sandybridge/g16")
setenv("GAUSS_LEXEDIR","/n/sw/g16_sandybridge/g16/linda-exe")
prepend_path("PATH","/n/sw/g16_sandybridge/g16/bsd:/n/sw/g16_sandybridge/g16/local:/n/sw/g16_sandybridge/g16/extras:/n/sw/g16_sandybridge/g16")
prepend_path("PATH","/n/sw/g16_sandybridge/nbo6_x64_64/nbo6/bin")

GaussView

RC users can download these clients from our Downloads page. You must be connected to the FASRC VPN to access this page. Your FASRC username and password are required to log in.

On MacOS: Move the downloaded file to the ‘Applications’ folder, unarchive it, and double click on the gview icon located in gaussview16_A03_macOS_64bit.

On Windows: Unarchive the file in the Downloads folder itself.

A pop up will appear saying “Gaussian is not installed”.

Click on OK. This would now open the gview interface.

Troubleshooting

Failed to locate data directory

On your MacOS, if you see a message similar to what is shown on the image here:

you can safely remove the data folder by executing this command on your terminal: ` rm -rf /private/var/`

GaussView doesn’t open

In the case GaussView doesn’t open on MacOS, do the following:

Go to the Applications folder > gaussview16 folder > Right click on gview and choose “Show Package Contents” (see below)

Go to the Contents folder of gview > MacOS folder > Right click on the gview executable and choose “Open”

A pop up will appear saying “Gaussian is not installed”. Ignore it and click on OK. This would now open the gview interface.

Note: We do not have license for GaussView on the cluster. It needs to be run locally.

Running Singularity image with CentOS 7 on Rocky 8

admin — Fri, 05 May 2023 14:30:52 +0000

If you absolutely need to still use CentOS 7 after the OS upgrade to Rocky 8, you can use it with SingularityCE. For more information on SingularityCE, see our Singularity documentation and GitHub User Codes.

We have a Singularity image with CentOS7 and the same environment of compute nodes (as of March 29th, 2023). In addition, you can access CentOS 7 modules from within the Singularity container. The image is stored at:

/n/singularity_images/FAS/centos7/compute-el7-noslurm-2023-03-29.sif

You can execute this image and/or copy it, but you cannot modify it in its original location. See below how you can modify this image.

Run Singularity with CentOS 7

To get a bash shell on CentOS 7 environment, you can run:

[jharvard@holy7c12102 ~]$ singularity run /n/singularity_images/FAS/centos7/compute-el7-noslurm-2023-03-29.sif
Singularity>

[jharvard@holy7c12102 ~]$ singularity exec /n/singularity_images/FAS/centos7/compute-el7-noslurm-2023-03-29.sif /bin/bash
Singularity>

NOTE: The command singularity shell is also an option. However it does not allow you to access modules as explained in Load modules

Load modules

You can still load modules that were available on CentOS 7 from inside the Singularity container:

[jharvard@holy7c12102 ~]$ singularity exec /n/singularity_images/FAS/centos7/compute-el7-noslurm-2023-03-29.sif /bin/bash
Singularity> module load gcc
Singularity> module load matlab
Singularity> module list

Currently Loaded Modules:
  1) gmp/6.2.1-fasrc01   2) mpfr/4.1.0-fasrc01   3) mpc/1.2.1-fasrc01   4) gcc/12.1.0-fasrc01   5) matlab/R2022b-fasrc01

Note that the modules are from the CentOS 7 environment:

Singularity> module display matlab/R2022b-fasrc01
-----------------------------------------------------------------------------------------------------------------------------------------------------------
   /n/helmod/modulefiles/centos7/Core/matlab/R2022b-fasrc01.lua:
-----------------------------------------------------------------------------------------------------------------------------------------------------------
help([[matlab-R2022b-fasrc01
a high-level language and interactive environment for numerical computation, visualization, and programming

]], [[
]])
whatis("Name: matlab")
whatis("Version: R2022b-fasrc01")
whatis("Description: a high-level language and interactive environment for numerical computation, visualization, and programming")
setenv("MATLAB_HOME","/n/helmod/apps/centos7/Core/matlab/R2022b-fasrc01")
prepend_path("PATH","/n/helmod/apps/centos7/Core/matlab/R2022b-fasrc01/bin")
setenv("MLM_LICENSE_FILE","27000@rclic1")
setenv("ZIENA_LICENSE_NETWORK_ADDR","10.242.113.134:8349")

Submit slurm jobs

If you need to submit a job rather than getting to a shell, you have to do the following steps in the appropriate order:

launch the singularity image
load modules
(optional) compile code
execute code

If you try to load modules before launching the image, it will try to load modules from the Rocky 8 host system.

To ensure that steps 2-4 are run within the singularity container, they are place between END (see slurm batch script below).

NOTE: You cannot submit slurm jobs from inside the container, but you can submit a slurm job that will execute the container.

Example with a simple hello_world.f90 fortran code:

program hello
  print *, 'Hello, World!'
end program hello

Slurm batch script run_singularity_centos7.sh:

#!/bin/bash
#SBATCH -J sing_hello           # Job name
#SBATCH -p rocky                # Partition(s) (separate with commas if using multiple)
#SBATCH -c 1                    # Number of cores
#SBATCH -t 0-00:10:00           # Time (D-HH:MM:SS)
#SBATCH --mem=500M              # Memory
#SBATCH -o sing_hello_%j.out    # Name of standard output file
#SBATCH -e sing_hello_%j.err    # Name of standard error file

# start a bash shell inside singularity image
singularity run /n/singularity_images/FAS/centos7/compute-el7-noslurm-2023-03-29.sif <<END

# load modules
module load gcc/12.1.0-fasrc01
module list

# compile code
gfortran hello_world.f90 -o hello.exe

# execute code
./hello.exe
END

To ensure that the commands are run within the singularity container, they are place between END.

To submit the slurm batch script:

sbatch run_singularity_centos7.sh

Another option have a bash script with steps 2-4 and then use singularity run to execute the script. For example, script_inside_container.sh:

#!/bin/bash

# load modules
module load gcc/12.1.0-fasrc01
module list

# compile code
gfortran hello_world.f90 -o hello.exe

# execute code
./hello.exe

And the slurm batch script run_singularity_centos7_script.sh becomes:

#!/bin/bash
#SBATCH -J sing_hello           # Job name
#SBATCH -p rocky                # Partition(s) (separate with commas if using multiple)
#SBATCH -c 1                    # Number of cores
#SBATCH -t 0-00:10:00           # Time (D-HH:MM:SS)
#SBATCH --mem=500M              # Memory
#SBATCH -o sing_hello_%j.out    # Name of standard output file
#SBATCH -e sing_hello_%j.err	# Name of standard error file

# start a bash shell inside singularity image
singularity run /n/singularity_images/FAS/centos7/compute-el7-noslurm-2023-03-29.sif script_inside_container.sh

You can submit a batch job with:

sbatch run_singularity_centos7_script.sh

Modify SingularityCE image with CentOS 7

If you need to run your codes in the former operating system (pre June 2023), you can build a custom SingularityCE image with proot. The base image is the the FASRC CentOS 7 compute node image, and you can add your own software/library/packages under the %post header in the Singularity definition file.

Step 1: Follow steps 1 and 2 in setup proot

Step 2: Copy the CentOS 7 compute image to your holylabs (or home directory). The base image file needs to be copied to a directory that you have read/write access, otherwise it will fail to build your custom image

[jharvard@holy2c02302 ~]$ cd /n/holylabs/LABS/jharvard_lab/Users/jharvard
[jharvard@holy2c02302 jharvard]$ cp /n/holystore01/SINGULARITY/FAS/centos7/compute-el7-noslurm-2023-02-15.sif .

Step 3: In definition file centos7_custom.def, set Bootstrap: localimage and put the path of the existing image that you copied for the From: field. Then, add your packages/software/libraries that you need. Here, we add cowsay:

Bootstrap: localimage
From: compute-el7-noslurm-2023-02-15.sif

%help
    This is CentOS 7 Singularity container based on the Cannon compute node with my added programs.

%post
    yum -y update
    yum -y install cowsay

Step 3: Build the SingularityCE image

[jharvard@holy2c02302 jharvard]$ singularity build centos7_custom.sif centos7_custom.def
INFO:    Using proot to build unprivileged. Not all builds are supported. If build fails, use --remote or --fakeroot.
INFO:    Starting build...
INFO:    Verifying bootstrap image compute-el7-noslurm-2023-02-15.sif
WARNING: integrity: signature not found for object group 1
WARNING: Bootstrap image could not be verified, but build will continue.
INFO:    Running post scriptlet
+ yum -y update

... omitted output ...

Running transaction
  Installing : cowsay-3.04-4.el7.noarch                   1/1
  Verifying  : cowsay-3.04-4.el7.noarch                   1/1

Installed:
  cowsay.noarch 0:3.04-4.el7

Complete!
INFO:    Adding help info
INFO:    Creating SIF file...
INFO:    Build complete: centos7_custom.sif

Software – FASRC DOCS

KNIME on the FASRC clusters

Description

KNIME as a module

KNIME on OOD

Pre-installed Extensions

KNIME Tutorial

Extracting compressed .zip or .7z archives with 7-Zip

Extracting archives

Creating and adding to archives

More information

Podman

Introduction

Podman Documentation

Working with Podman

pull

images

run

shell

GPU Example

Batch Jobs

Accessing Files

Environment Variables

Building Your Own Podman Container

Downloading OCI Container Image From Registry

Build Podman Image From Dockerfile/Containerfile

Saving/loading a container image on another compute node

Pushing a container image to a container registry

Online Trainings Materials

References

Containers

Introduction

Types of Containers

Rootless

Rootless and Filesystems

Getting Started

Shell

GPU

Docker Rate Limiting

Advanced Usage

Python Package Installation

Description

Usage

Environments

Pip Installs

Best Practices

Use mamba environment in Jupyter Notebooks

Mamba environments in a desired location

Troubleshooting

Interactive vs. batch jobs

Jupyter Notebook or JupyterLab on Open OnDemand/VDI problems

R and RStudio on the FASRC clusters

What is R?

RStudio Server

FASSE cluster additional settings

Package Seurat

R, CRAN, and RStudio Server pinned versions

Use R packages from RStudio Server in a batch job

Advanced Users

R module

R Module + RStudio Desktop

R in Jupyter

R with Spack

R and RStudio on Windows

Troubleshooting

Files that may configure R package installations

References

Macaulay2

Description

Macaulay2 on the cluster

Resources

Mathematica

Description

Examples

Resources

Gaussian

Access

Running Gaussian

Versions

GaussView

Use `mamba` environment in Jupyter Notebooks