Open On Demand (OOD) – FASRC DOCS

KNIME on the FASRC clusters

RC Admin — Wed, 21 May 2025 15:23:14 +0000

Description

KNIME is an open-source data analytics, reporting, and integration platform that is meant to perform various aspects of machine-learning & data mining through its modular data pipelining concept. The platform offers a way to integrate various tasks ranging from developing analytic models to deploying them and sharing insights with your team. The KNIME Analytics Platform offers users 300+ connectors to multiple data sources and integrations to all popular machine learning libraries.

The software’s key capabilities include Data Access & Transformation, Data Analytics, Visualization & Reporting, Statistics & Machine Learning, Generative AI, Collaboration, Governance, Data Apps, Automation, AI Agents.

Given KNIME’s wide scale use and applicability, we have converted it into a system-wide module that can be loaded from anywhere on any of the FASRC clusters, Cannon or FASSE. Additionally, we have packaged it as an app that can be launched using the cluster web interface, Open on Demand (OOD).

KNIME as a module

Knime is available as a module on the FASRC clusters. In order to know more about the module including the versions available and how to load one of them, execute from a terminal on the cluster: module spider knime

This would pull up the information on the versions of KNIME software that are available to load. For example, for a user jharvard on a compute node, the module spider command would produce the following output:

[jharvard@holy8a26602 ~]$ module spider knime/knime:Description:An open-source data analytics, reporting, and integration platform meant to perform various aspects of machine-learning & data mining through its modular data pipelining concept.
Versions:knime/5.4.3-fasrc01knime/5.4.4-fasrc01
For detailed information about a specific "knime" package (including how to load the modules) use the module's full name.
Note that names that have a trailing (E) are extensions provided by other modules.
For example:$ module spider knime/5.4.4-fasrc01

To load a specific module, one can execute: module load knime/5.4.3-fasrc01

Or, to load the default & typically the latest module, one can run: module load knime command. This would result in, e.g.:

[jharvard@holy8a26602 ~]$ module load knime [jharvard@holy8a26602 ~]$ module list Currently Loaded Modules: 1) knime/5.4.4-fasrc01

Once the knime module is loaded, one can launch the GUI by running the knime executable on the terminal provided you ssh into the cluster using X11 forwarding, preferably with the -Y option, and that XQuartz (MacOS) or MobaXterm (Windows) is installed on your local device that is being used to login to the cluster. For example:

ssh -Y jharvard@login.rc.fas.harvard.edu

[jharvard@holylogin05 ~]$ salloc -p test --x11 --time=2:00:00 --mem=4g
[jharvard@holy8a26602 ~]$ module load knime
[jharvard@holy8a26602 ~]$ knime

One can ignore the following libGL errors and should expect to see a GUI appear as shown in the screen shot below.
libGL error: No matching fbConfigs or visuals found libGL error: failed to load driver: swrast

Note: While you can launch KNIME directly on the cluster using X11 forwarding, it is laggy and doesn’t render itself well to faster executions that might be needed for certain KNIME workflows. To avoid issues associated with X11 forwarding, we recommend launching KNIME using OOD.

Both these modules are also available to use via the Knime OOD app, as explained below.

KNIME on OOD

KNIME can be run from Open OnDemand (OOD, formerly known as VDI) by choosing it from the Interactive Apps menu, and specifying your resource needs. Hit Launch, wait for the session to start, and click the “Launch Knime” button.

You can also launch KNIME from the Remote Desktop app on OOD.

Pre-installed Extensions

Both KNIME modules come with the following pre-installed extensions:

For GIS: Geospatial Analytics Extension for KNIME
For Programming:
KNIME Python Integration
KNIME Interactive R Statistics Integration
For Machine Learning:
KNIME H2O Machine Learning Integration
KNIME XGBoost Integration
KNIME Machine Learning Interpretability Extension
For OpenAI, Hugging Face, and other LLMs: KNIME AI Extension
For AI Assistant Coding: KNIME AI Assistant (Labs)
For Google Drive Integration: KNIME Google Connectors

Note: New extensions cannot be installed by the users on the fly as modules don’t come with write permissions.

KNIME Tutorial

The link here takes you to the KNIME tutorial that has been prepared by Lingbo Liu from Harvard’s Center for Geographic Analysis (CGA). This tutorial is best executed by launching the Knime app on OOD.

RStudio Server vs. RStudio Desktop OOD apps

RC Admin — Mon, 07 Nov 2022 18:42:56 +0000

Disclaimer: The differences presented here are specifically applicable to RStudio in the FASRC Open OnDemand environment and not for the general RStudio Desktop vs. RStudio Server.

FASRC has implemented two different Open OnDemand (OOD, formerly called VDI) applications for RStudio:

RStudio Server through the OOD app “RStudio Server”
RStudio Desktop through the OOD app “Remote Desktop” then launching RStudio Desktop

In this doc, we attempt to explain the major difference between the two.

RStudio Server

RStudio Server is our go-to RStudio app because it contains a wide range of precompiled R packages from bioconductor and rocker/tidyverse. This means that installing R packages in RStudio Server is pretty straightforward. Most times, it will be sufficient to simply:

> install.packages("package_name")

This simplicity was possible because RStudio Server runs inside a Singularity container, meaning that it does not use the host operating system (OS). For more information on Singularity, refer to our Singularity on the cluster docs.

Important notes:

User-installed R libraries will be installed in ~/R/ifxrstudio/\
This app contains many pre-compiled packages from bioconductor and rocker/tidyverse.
FAS RC environment modules (e.g. module load) and Slurm (e.g. sbatch) are not accessible from this app.
For the RStudio with environment module and Slurm support, go to our Open OnDemand page select Interactive Apps > Remote Desktop and refer to Open OnDemand Remote Desktop: How to open software

This app is useful for most applications, including multi-core jobs. However, it is not suitable for multi-node jobs. For multi-node jobs, the recommended app is RStudio Desktop.

Installing R packages in RStudio Server in the FASSE cluster

If you are using FASSE Open OnDemand and need to install R packages in RStudio Server, you will likely need to set the proxies as explained in our Proxy Settings documentation. Before installing packages, execute these two commands in RStudio Server:

> Sys.setenv(http_proxy="http://rcproxy.rc.fas.harvard.edu:3128")
> Sys.setenv(https_proxy="http://rcproxy.rc.fas.harvard.edu:3128")

Running as a batch or interactive job

The RStudio Server OOD app hosted on Cannon at rcood.rc.fas.harvard.edu and FASSE at fasseood.rc.fas.harvard.edu runs RStudio Server in a Singularity container (see Singularity on the cluster). The path to the Singularity image on both Cannon and FASSE clusters is the same:

/n/singularity_images/informatics/ifxrstudio/ifxrstudio:RELEASE_.sif

Where corresponds to the Bioconductor version listed in the “R version” dropdown menu. For example:

R 4.2.3 (Bioconductor 3.16, RStudio 2023.03.0)

uses the Singularity image:

/n/singularity_images/informatics/ifxrstudio/ifxrstudio:RELEASE_3_16.sif

As mentioned above, when using the RStudio Server OOD app, user-installed R packages by default go in:

~/R/ifxrstudio/RELEASE_

Batch job

The command-line invocation in a batch job would be, for example (this will run the R script myscript.R):

This is an example of a batch script named runscript.sh that executes R script myscript.R inside the Singularity container RELEASE_3_16:

#!/bin/bash
#SBATCH -c 1 # Number of cores (-c)
#SBATCH -t 0-01:00 # Runtime in D-HH:MM
#SBATCH -p test # Partition to submit to
#SBATCH --mem=1G # Memory pool for all cores (see also --mem-per-cpu)
#SBATCH -o myoutput_%j.out # File to which STDOUT will be written, %j inserts jobid
#SBATCH -e myerrors_%j.err # File to which STDERR will be written, %j inserts jobid

# set R packages and rstudio server singularity image locations
my_packages=${HOME}/R/ifxrstudio/RELEASE_3_16
rstudio_singularity_image="/n/singularity_images/informatics/ifxrstudio/ifxrstudio:RELEASE_3_16.sif"

# run myscript.R using RStudio Server signularity image
singularity exec --cleanenv --env R_LIBS_USER=${my_packages} ${rstudio_singularity_image} Rscript myscript.R

To submit the job, execute the command:

sbatch runscript.sh

Interactive job

Or to run R interactively (this will launch an R shell that you can interact with) — not applicable to FASSE where interactive jobs are not allowed:

singularity exec --cleanenv --env R_LIBS_USER=$HOME/R/ifxrstudio/RELEASE_3_16 /n/singularity_images/informatics/ifxrstudio/ifxrstudio:RELEASE_3_16.sif R

RStudio Desktop

RStudio Desktop is a “bare” version of RStudio. Although it has some precompiled R packages, it is a much more limited list than the RStudio Server app.

RStudio Desktop runs on the host operating system (OS), the same environment as when you ssh to Cannon or FASSE.

This app is particularly useful to run multi-node applications because the you can specify the exact modules and packages that you need to load

Open OnDemand (OOD/VDI) Remote Desktop: How to open software

RC Admin — Mon, 07 Nov 2022 18:39:37 +0000

Introduction

In this document, you can see how to launch different software in the Open OnDemand (OOD) Remote Desktop app (available at rcood.rc.fas.harvard.edu)

You can launch the Remote Desktop app on the Cannon cluster from rcood.rc.fas.harvard.edu and on the FASSE cluster from fasseood.rc.fas.harvard.edu.

When the Remote Desktop app opens, click on the terminal icon to launch a terminal (or click on Applications -> Terminal Emulator). Below you can follow the instruction to launch various software.

Keep in mind that, for the most part, the terminal window needs to stay open. If the terminal window is closed, the software that you launched via terminal will be closed too.

Remote Desktop login

To comply with Harvard’s security policy, if the Remote Desktop session becomes idle, the Remote Desktop session will lock. You need to enter your FASRC password to log back in.

Abaqus

In the terminal type the commands to load the modules and launch Abaqus

[jharvard@holy7c24102 ~]$ module load abaqus
[jharvard@holy7c24102 ~]$ export LANG=en_US
[jharvard@holy7c24102 ~]$ abaqus cae -mesa cpus=$SLURM_CPUS_PER_TASK &

You can see different versions of Abaqus in our modules page.

The Abaqus license is restricted to SEAS. For more information, see our Abaqus docs.

Comsol

In the terminal type the commands to load the modules and launch Comsol

[jharvard@holy7c24102 ~]$ module load comsol
[jharvard@holy7c24102 ~]$ comsol -3drend sw -np $SLURM_CPUS_PER_TASK &

You can see different versions of Comsol in our modules page.

The Comsol license is restricted to SEAS. For more information, see our Comsol docs.

For how to set Comsol temporary directory, see our Comsol Troubleshooting doc.

Jupyter Notebook

(optional) Creating and loading a mamba/conda environment

Note: this is a one-time setup to ensure that your conda environment can be loaded in Jupyter Notebook.

See our Python documentation on how to create a conda environment.

Then, in order to see your conda environment in Jupyter Notebook, ensure that you have installed the packages ipykernel and nb_conda_kernels. To do so, launch a terminal in the Remote Desktop and type the commands:

[jharvard@holy7c24102 ~]$ module load python
[jharvard@holy7c24102 ~]$ source activate my_conda_environment
[jharvard@holy7c24102 ~]$ mamba install ipykernel
[jharvard@holy7c24102 ~]$ mamba install nb_conda_kernels

For more information on creating conda environments for TensorFlow and PyTorch, see our GitHub documentation:

You can see different versions of Mambaforge or Python in our modules page.

Launching Jupyter Notebook

In the Remote Desktop terminal type the commands to load the modules and launch Jupyter Notebook:

[jharvard@holy7c24102 ~]$ module load python
# (optional) load conda environment
[jharvard@holy7c24102 ~]$ source activate my_conda_environment
# launch jupyter notebook
[jharvard@holy7c24102 ~]$ jupyter notebook

After the jupyter notebook command, it may hang for a few seconds. Be patient, a Firefox window will open soon after.

To select my_conda_environment as the kernel, go to Kernel -> Change kernel, and select the kernel (i.e. conda environment) of your choice.

Note: if you prefer to launch Jupyter Lab, note that conda environments cannot be loaded when using Jupyter Lab. Only the base environment is available.

Cleanly close Jupyter Notebook

These are instructions to kill your Jupyter server and so you can exit the job cleanly.

First, close each Jupyter Notebook you have open: click on File -> Close and Halt.

Then, from the Jupyter Notebook Home Page (where you can browse files and folders), on the top right corner, click on “Quit”. Close the Firefox window.

LibreOffice

LibreOffice is a free and open source suite that is compatible with a wide range of formats, including those from Microsoft Word (.doc, .docx), Excel (.xls, .xlsx), PowerPoint (.ppt, .pptx) and Publisher.

LibreOffice is available in the FASRC cluster (both Cannon and FASSE) through a Singularity image. Therefore, LibreOffice is only available through the Remote Desktop app. LibreOffice does not work in the Containerized Remote Desktop app.

In the terminal type the commands to pull and create a singularity image with LibreOffice installed within the container. This command is only needed once.

[jharvard@holy7c24102 ~]$ singularity pull docker://linuxserver/libreoffice

To launch LibreOffice, in the terminal, run the command

[jharvard@holy7c24102 ~]$ singularity exec --cleanenv --env DISPLAY=$DISPLAY libreoffice_latest.sif soffice

Lumerical

In the terminal type the commands to load the modules and launch Lumerical

[jharvard@holy7c24102 ~]$ module load lumerical-seas
[jharvard@holy7c24102 ~]$ launcher

The Lumerical license is restricted to SEAS. For more information, see our Lumerical docs.

Matlab

In the terminal type the commands to load the modules and launch Matlab

[jharvard@holy7c24102 ~]$ module load matlab
[jharvard@holy7c24102 ~]$ matlab -desktop -softwareopengl

You can see different versions of Matlab in our modules page.

RStudio Desktop

In the terminal type the commands to load modules

[jharvard@holy7c24102 ~]$ module load R
[jharvard@holy7c24102 ~]$ module load rstudio

Set environmental variables

[jharvard@holy7c24102 ~]$ unset R_LIBS_SITE
[jharvard@holy7c24102 ~]$ mkdir -p $HOME/apps/R_version
[jharvard@holy7c24102 ~]$ export R_LIBS_USER=$HOME/apps/R_version:$R_LIBS_USER

Launch RStudio Desktop

[jharvard@holy7c24102 ~]$ rstudio

# vanilla option (combines --no-save, --no-restore, --no-site-file, --no-init-file and --no-environ)
[jharvard@holy7c24102 ~]$ rstudio --vanila

You can see different versions of R and RStudio in our modules page.

Remoteviz Partition

If you have used the “FAS-RC Remote Visualization” Open OnDemand (or VDI) app, we have decomissioned that

SageMath

You can use sage wither in a interactive shell using command line interface or by launching a Jupyter Notebook with the SageMath kernel. To launch a Jupyter Notebook, in the terminal, type the commands to load the modules and launch Jupyter

[jharvard@holy7c24102 ~]$ module load sage
[jharvard@holy7c24102 ~]$ sage -n jupyter

Ensure that you have “SageMath” kernel selected. If not, go to Kernel -> Change kernel, and select SageMath.

For examples, see Sage documentation:

SAS

In the terminal type the commands to load the modules and launch SAS

[jharvard@holy7c24102 ~]$ module load sas
[jharvard@holy7c24102 ~]$ sas &

You can see different versions of SAS in our modules page.

Stata

In the terminal type the commands to load the module and launch Stata

[jharvard@holy7c24102 ~]$ module load stata/17.0-fasrc01

# if you are using single-core jobs
[jharvard@holy7c24102 ~]$ xstata-se

# if you are using multi-core jobs
[jharvard@holy7c24102 ~]$ xstata-mp "set processors $SLURM_CPUS_PER_TASK"

KNIME

In the terminal type the following commands to load the module and launch Knime.

[jharvard@holy7c24102 ~]$ module load knime
[jharvard@holy7c24102 ~]$ knime &

TensorBoard

For TensorBoard, you will first need to create a conda enviroment (Step 1). You only need to create a conda environment once. If you have created one, you can skip to Step 2. Or, if you have your own environment make sure you install the tensorboard package and then you can skip to Step 2.

Step 1: Create conda environment

In a terminal, load Mambaforge or Python module, create a mamba environment, activate it, and install TensorBoard inside the mamba environment

[jharvard@holy7c24102 ~]$ module load python
[jharvard@holy7c24102 ~]$ module load cuda/11.7.1-fasrc01
[jharvard@holy7c24102 ~]$ module load cudnn/8.5.0.96_cuda11-fasrc01
[jharvard@holy7c24102 ~]$ conda create -n tb_tf2.10_cuda11 python=3.10 pip numpy six wheel scipy pandas matplotlib seaborn h5py jupyterlab
[jharvard@holy7c24102 ~]$ source activate tb_tf2.10_cuda11
[jharvard@holy7c24102 ~]$ conda install -c conda-forge tensorboard
[jharvard@holy7c24102 ~]$ conda install -c conda-forge tensorflow

You can see different versions of Mambaforge or Python in our modules page.

Step 2: Activate conda environment and launch TensorBoard

In a terminal, setup variables for TensorBoard. Make sure that the data you need visualize in tensorboard is located in the log directory MY_TB_LOGDIR. You can either use the suggested path below or use somewhere else that better suits your workflow.

# Find available port to run server on (does not output anything to screen)
[jharvard@holy7c24102 ~]$ for myport in {6818..11845}; do ! nc -z localhost ${myport} && break; done

# setup tensorboard environmental variables
[jharvard@holy7c24102 ~]$ export MY_TB_PORT=${myport}
[jharvard@holy7c24102 ~]$ export MY_TB_BASEURL=/node/${host}/${myport}/
[jharvard@holy7c24102 ~]$ export MY_TB_LOGDIR=$HOME/.tensorboard/log/$SLURM_JOBID
[jharvard@holy7c24102 ~]$ mkdir -p $MY_TB_LOGDIR

# load module, activate conda environment, and launch tensorboard
[jharvard@holy7c24102 ~]$ module load python
[jharvard@holy7c24102 ~]$ module load cuda/11.7.1-fasrc01
[jharvard@holy7c24102 ~]$ module load cudnn/8.5.0.96_cuda11-fasrc01
[jharvard@holy7c24102 ~]$ source activate tb_tf2.10_cuda11 
(tb_tf2.10_cuda11) tensorboard --host localhost --port ${MY_TB_PORT} --logdir ${MY_TB_LOGDIR} --path_prefix ${MY_TB_BASEURL}

You can see different versions of Mambaforge or Python in our modules page.

Right click on the link that starts with “http://localhost” and click on “Open Link”. This will open a Firefox browser where you will be able to see your results.

Example

Using the environment created in Step 1, run the small program tb_test.py in a directory of your choice and visualize its results.

Source code of tb_test.py:

import os
import tensorflow as tf
import datetime

def create_model():
    return tf.keras.models.Sequential([
        tf.keras.layers.Flatten(input_shape=(28, 28)),
        tf.keras.layers.Dense(512, activation='relu'),
        tf.keras.layers.Dropout(0.2),
        tf.keras.layers.Dense(10, activation='softmax')
    ])

mnist = tf.keras.datasets.mnist
(x_train, y_train),(x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

model = create_model()
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])

logdir = os.getenv('MY_TB_LOGDIR')
print(logdir)

tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=logdir, histogram_freq=1)
model.fit(x=x_train, 
          y=y_train, 
          epochs=5, 
          validation_data=(x_test, y_test), 
          callbacks=[tensorboard_callback])

Setup variables and run tb_test.py

# Find available port to run server on (does not output anything to screen)
[jharvard@holy7c24102 tb_example]$ for myport in {6818..11845}; do ! nc -z localhost ${myport} && break; done

# go to the directory that you have your tb_test.py file
[jharvard@holy7c24102 ~]$ cd tb_example

# setup tensorboard environmental variables
[jharvard@holy7c24102 tb_example]$ export MY_TB_PORT=${myport}
[jharvard@holy7c24102 tb_example]$ export MY_TB_BASEURL=/node/${host}/${myport}/

# this command will set MY_TB_LOGDIR to your current working directory
[jharvard@holy7c24102 tb_example]$ export MY_TB_LOGDIR=$PWD

# load modules and activate conda environment
[jharvard@holy7c24102 tb_example]$ module load python
[jharvard@holy7c24102 tb_example]$ module load cuda/11.7.1-fasrc01
[jharvard@holy7c24102 tb_example]$ module load cudnn/8.5.0.96_cuda11-fasrc01
[jharvard@holy7c24102 tb_example]$ source activate tb_tf2.10_cuda11

# run python code
(tb_tf2.10_cuda11) python tb_test.py

# launch tensorboard
(tb_tf2.10_cuda11) tensorboard --host localhost --port ${MY_TB_PORT} --logdir ${MY_TB_LOGDIR} --path_prefix ${MY_TB_BASEURL}

Right click on the link that starts with “http://localhost” and click on “Open Link”. This will open a Firefox browser where you will be able to see your results.

TotalView

TotalView is a debugging tool particularly suitable for parallel applications. The modules you need to load depend on the compilers used in the code you are trying to debug. Due to this compiler dependency, we refer you to a more elaborate TotalView documentation.

Visual Studio Code

In the terminal type the commands to load the modules and launch Visual Studio Code

[jharvard@holy7c24102 ~]$ module load vscode
[jharvard@holy7c24102 ~]$ code --user-data-dir $HOME/.vscode/data/ &

You can see different versions of Visual Studio Code in our modules page.

Mathematica

In the terminal type the commands to load the modules and launch Mathematica

[jharvard@holy7c24102 ~]$ module load mathematica
[jharvard@holy7c24102 ~]$ mathematica

You can see different versions of Mathematica in our modules page.

Developing your own app using Open OnDemand

RC Admin — Wed, 23 Mar 2022 20:24:07 +0000

FASRC provides applications on Open OnDemand based on software usage/user requests.
If you want to create your own Open OnDemand app, you can! OnDemand provides a way to develop OOD apps in a sandbox development environment. You’ll need to enable this environment before you can begin testing applications.

To enable your development environment:

Get to a shell interface: you could SSH to a login node, or connect to Open OnDemand on your cluster and use the Clusters menu item to get shell access, or start a Remote Desktop session and use the Terminal Emulator.
Create the dev directory
1. Cannon: mkdir $HOME/.fasrcood/dev
2. FASSE: mkdir $HOME/.fasseood/dev
This will make the Develop menu item appear in the upper right when you view the Open OnDemand interface
Now you can access your sandbox development environment: either click on the item in the Develop menu in the dashboard, or navigate in the shell to $HOME/.fasrcood/dev, create an app folder in there and start your development.

Once your dev path is created, you can start working on applications there.

To get started, FASRC has some example apps, which (at time of writing) run on the Cannon cluster without any modification needed.

Remote Desktop
SAS
Matlab
RStudio
Jupyter

Have a look at these to get a sense of how the app repositories are structured and how they refer to resources.
SAS and Matlab run within a Remote Desktop session, while RStudio uses Singularity containers.

For examples of additional apps, OSC provides links to applications that are implemented in Open OnDemand there and at other contributing institutions: https://osc.github.io/ood-documentation/latest/install-ihpc-apps.html

To get a copy of an application to work with, navigate to your dev folder, then clone the Github repo of the app.
cd $HOME/.fasrcood/dev
git clone https://github.com/fasrc/ood-rstudio-rocker

The new app will appear when you go to the menu item Develop->My Sandbox Apps (Development).

If you’re on the FASSE cluster and want to be able to run your app, you’ll want to update form.yml.erb to use
cluster: "fasse"
and
bc_queue:
value: "fasse"
or another appropriate partition: https://docs.rc.fas.harvard.edu/kb/fasse/#SLURM_and_Partitions

As an example of additional ways to specify container locations:
A previous version of the RStudio app listed the Singularity container location directly in the script file.
https://github.com/fasrc/fas-ondemand-rstudio/blob/master/template/script.sh.erb
This lists a location for the Singularity images that is not accessible on the Cannon or FASSE cluster. You will need to update the location to use the path for the directory of the container you have created.

Feel free to reach out to rchelp@rc.fas.harvard.edu or visit our office hours to help deploy a container or answer further questions.