Search Results for: security policy

FAS RC Research Data Retention and Deletion Policy

FAS RC Research Data Retention and Deletion Policy

Purpose: 

This policy defines FAS RC standards and procedures for the retention and deletion of research data, outputs, temporary files, and associated digital resources managed by the FAS RC in support of research activities. 

Scope: 

This policy applies to all research data stored, processed, or managed on servers, workstations, cloud resources, storage systems, or backup media provisioned by the FAS Research Computing Service Group.

Data Retention: 

Following the departure of faculty from the University, the associated primary department will assume responsibility for the maintenance, storage, and cost of housing the remaining research data.

Home Directories:

Aligning with the University Research Data Security Policy and the Retention and Maintenance of Research Records and Data Frequently Asked Questions (“FAQs”), home directories will be retained for no more than 7 years following a researcher’s departure from the University or the deactivation of their FASRC account. The researcher’s last login to their FASRC account will be used to track compliance. 

Project Data:

Principal Investigators (PIs) should notify FAS RC 60 days prior to their departure from the University including the duration of any appointments (courtesy or associate), with instructions and next steps for remaining datasets. 

For research data associated with completed or inactive research projects and/or departed faculty where no notice has been given to FAS RC as to where the research data should be stored

  1. The PIs Harvard affiliated primary department becomes responsible for the storage and cost of the research data. Closure of the PIs group and project in FAS RC will be used to track compliance. 
  2. The research data will be retained in the source storage directory for 2 years following project completion or inactivity. Completion of a project occurs after: 
    1. final reporting to the research sponsor 
    2. final financial close-out of a sponsored research award segment 
    3. final publication of research results 
    4. cessation of academic or scientific activity on a specific activity on a specific research project, regardless of whether its results are published, whichever is later. 
  3. Following 2 years of inactivity, data will be migrated to FASRC Long-Term Storage. The data will be retained for an additional 5 years to meet the University Data Retention guidelines. Following the completion of 5 years, the data can be deleted. Departments will be notified via email prior to the deletion.

Temporary and Scratch Storage:

Data stored in scratch or temporary directories may be deleted after 90 days without notice to maximize available resources. 

Deletion Procedures: 

  • Faculty and/or departments will be notified in advance of research data being deleted, per the timelines above. If PIs or Faculty are no longer associated with the University, the relevant department leadership will be notified via email. 
  • Data will be deleted using secure erasure methods in accordance with institutional IT security standards. 
  • Requests for retention extension can be made in writing and are subject to approval by FASRC and the department; individuals requesting the extension will be responsible for all associated storage costs. 

Ownership and Roles: 

  • University: Harvard University owns all research data generated through projects conducted under its authority or using its resources. While PIs and researchers manage and safeguard the data, the University is ultimately responsible for compliance with legal and sponsor requirements, ensuring confidentiality and security. 
  • Principal Investigators: Principal Investigators (PIs) are stewards of research data. If PIs choose to delegate responsibility within their research groups, the PI remains accountable to the University for stewardship of the data. Principal Investigators are responsible for ensuring proper data management, storage, and accessibility, meeting all University, legal, and sponsor requirements. This involves setting up procedures for data retention, confidentiality, and sharing while respecting data use agreements. 
  • Departments: In the case that a PI has left the University without delegating responsibility for data, the associated primary department of the departed PI takes on the role of steward. 
  • Researchers: Harvard community members who assist with management of data created, analyzed, and stored on FAS RC systems.
  • FAS RC: Responsible for executing deletions as outlined, maintaining logs of deletion actions, and responding to extension or exception requests. 

Policy Review: 

This policy will be reviewed and updated annually or as required by regulatory or operational changes. 

Last modification date: 2025-12-02

Related Policies and Information 

FASRC Cluster Storage Policy

FASRC Cluster Storage Policy

Cluster storage offered and maintained by FASRC should only be used for research taking place on FASRC clusters.

Examples of data that can be stored on FASRC storage are:

  • Datasets
  • Code
  • Scientific software
  • Research results

Examples of data that should not be stored on FASRC storage include:

  • Clerical or lab administrative data
  • Data related to personnel, grant proposals, business operations, or general lab management
  • Data with personally identifiable or financial information 

FASRC storage filesystems are only approved for Data Security Level 1 (DSL1) and  DSL2 research data on the Cannon cluster. DSL3 data must be stored in the approved FASSE cluster project. Research data containing information classified as DSL 4 must be stored on an appropriate storage solution that is approved for DSL4 sensitive data.*

*A limited number of DSL4 projects exist in their own isolated environments

If it comes to the attention of the FASRC Staff that non research related data is being stored on the FASRC systems, we will alert the lab’s PI.

To view alternative storage options for administrative data, please refer to the FASRC website.  Additional information is also provided on the Harvard Security website regarding Data Security levels.

Data Security Levels

Data Security Levels

What is a Data Security Level (DSL)?

Harvard groups data into 5 data security levels depending on the sensitivity of the data.  The DSL for data determines how that data must be managed.

DISCLAIMER: The information on this page relates only to the FASRC clusters and our current understanding of Harvard policy. Please refer to the Harvard Security Data Security Levels page for up-to-date university policies and information.

 

Cluster Data Security Level Ratings

Public
Public information (Level 1/DSL1): The FASRC Cannon cluster is rated only for DSL 1 and DSL 2 data.
Low Risk
Low Risk (Level 2/DSL2): The FASRC Cannon cluster is rated only for DSL 1 and DSL 2 data.
Medium Risk
Medium Risk (Level 3/DSL3): Only the FASRC FASSE (FAS Secure Environment) cluster is rated for DSL3 data.
High Risk
High Risk (Level 4/DSL4): For DSL4 projects, please contact University RC (URC) for options.
Extreme Risk
Extreme Risk (Level 5/DSL5): FASRC has no systems rated for DSL5 data.

 

LINKS

Web Scraping Policy

Web Scraping Policy

Web scraping is a contentious issue within research. While it is true that fair use provides for many uses of data gleaned from the Internet, in general this is applied to human information gathering, not programmatic machine scraping. That distinction makes the act of brute-force scraping an issue separate from fair use.

You, as a representative of Harvard, are not just using the source’s data, but also their servers, bandwidth, etc. in a way the source may not approve. This can lead to IP blacklisting and even legal action. So please tread carefully as your actions could negatively affect others.

If in doubt or in need of more authoritative guidance, please contact the Harvard Office of the General Counsel or Office of the Vice Provost for Research

If you are scraping for the purpose of train a GAI model, contact the Harvard Office of the General Counsel or Office of the Vice Provost for Research

Please be aware that merely being involved in academic pursuits does not exempt you from the usage policies of social media and other Internet platforms like Facebook, Twitter, etc.

Sensitive Data

If the data you are acquiring is considered sensitive, confidential, or contains human data, you will need to have this data reviewed for compliance before placing it on the FASRC cluster. If in doubt, you should always err on the side of caution and contact the Office of the Vice Provost for Research

 

Scraping data for use on the FASRC Cluster

If your research requires you to scrape content from the web, please review the following guidelines and suggestions.

We highly discourage using the cluster itself to scrape data. Due to its size and ease of parallelization of processes, the cluster is easily weaponized and your actions could have consequences for other researchers. Please seek another avenue for data acquisition first.

You should contact FASRC before commencing any scraping activity using the FASRC cluster.

It is highly preferable that you do the scraping elsewhere and then bring the data to the FASRC cluster for processing. If the data is sensitive, confidential, contains human data, or it is unclear, then this is a requirement. See ‘Sensitive Data’ above.

Also, if you are scraping for the purpose of training a GAI/LLM model, you should respect that site’s policies on this practice (this may be posted on the site, contained in a robots.txt file, or explicitly stated in their ToS). Even if you are doing the scraping manually, you should consider yourself the same as a bot and, if a site excludes GAI/AI bots, this also applies to you. Merely being an academic does not exempt you from following the wishes of a site and/or its members; your exfiltrated data could end up in other models thereby nullifying the source’s right to exclusivity/ownership. Please contact the Harvard Office of the General Counsel or Office of the Vice Provost for Research for further guidance.

Source Permission

If you are in doubt or have questions, please contact the Harvard Office of the Vice Provost for Research

Data on the Internet should not be programmatically (or ‘brute-force’) scraped using FASRC computing resources, even for academic research purposes, unless FASRC has given permission to proceed using the cluster or some system tied to the cluster, and:

A) The source provides an API for this purpose and any requirements they impose have been met.

B) The source allows/does not prohibit scraping in their terms of service or other public notice.

C) The source is the United States government and the data in question was generated with public funds and is publicly available without encumbrance. Further, that the site not be scraped using brute-force means if an API is provided.

D) The source has given you explicit permission in writing or via a secondary document spelling out that permission.

E) The source does not exclude/forbid your use-case, such as GAI or LLM training.

Data cannot be programmatically scraped using FASRC computing resources if the source has explicitly forbidden scraping in their terms of service and written permission to do so cannot be obtained. In such a case, you should investigate other options for acquiring this or similar data.

Throttling and Blacklisting

Scraping content from websites using highly parallelized processes, even with unfettered permission from the source, should be avoided. Doing so runs the risk of having the cluster, or even the university’s, IP range blacklisted. This could have an undesirable effect on other network and cluster users. Please ensure your processes pull data at a reasonable rate unless you explicitly have written approval from the data source to download more aggressively and assurance that this will not lead to blacklisting from them or their upstream provider.

Related:

Harvard Office of the Vice Provost for Research

US Data.gov Data Harvesting Information

Archive.org Scraping

 

Onboarding Policies and Procedures

Onboarding Policies and Procedures

This document outlines FAS Research Computing’s policies and procedures related to the onboarding of researchers and PIs. The document is structured as a checklist, to be utilized by researchers and PIs as they enter the university or join a new lab. The document also notates differences between the onboarding of researchers and faculty (PIs).  

Onboarding Checklist: Faculty

 

Onboarding Checklist: Researchers

Virtual Machines & Virtual Hosting

Virtual Machines & Virtual Hosting

As of December 2024, FASRC does not provide a general virtual machine service as part of its core services. It has in the past attempted to fill this gap when no other options were available, but 1) there was no funding for hardware or support for this service and its infrastructure  is old and being retired 2) other options, within and without Harvard, now exist.

If you require a VM for web hosting or other needs or for hosting or sharing data sets, please see the following options.

Harvard-based options:

Self-service, pay as you go, managed by you:

Please note that PIs and other data owners are responsible for following Harvard Information Security Policy and all other applicable Harvard policies and requirements. This includes knowing your data and following  the requirements for Data Security Level for servers and Research Data Management Security and Ownership Policies

FASSE / Protected Data Transfers

FASSE / Protected Data Transfers

FASSE / Protected Data Transfers

To preface this:  You are responsible for knowing, and complying with applicable Harvard Information Security Policy (controls that apply to DSL3 and lower), Harvard Research Data Security Policy, and any applicable contracts / data use agreements.

FASSE data transfers generally work the same as transfers for other environments.  For example:

  • When connected to the FASSE VPN realm, you can copy files to and from the FASSE cluster, assuming this meets policy/DUA compliance requirements.
  • While on FASSE nodes (compute, login, etc.) and the FASSE VPN, you have full access to the Internet through a proxy.
    • Generally, this means that you can push to or pull from any HTTPS, SFTP, or other service that supports a proxy.
    • For example, this means you should be able to pull data from data providers that provide an HTTPS, SFTP, or other service.  You may need to adjust certain configurations and workflows to use the proxy – Some details on this here

With that said, given that FASSE is rated for data security level (DSL) 3 data:

  • Do not store DSL 3 / FASSE data in your home directory.
  • If you have a DUA that requires encryption at rest, you must not use scratch for any data that the DUA applies to.  Neither local scratch, nor our global scratch, support encryption at rest.
  • FASSE VPN, login, compute, and VDI environments use a proxy.  Some transfer solutions do not work through a proxy.  If you run into this:
    • Please ensure you have tried to use a proxy, and if you still run into trouble,
    • Open a ticket with rchelp@rc.fas.harvard.edu indicating
      • What you have tried
      • What you expected to happen
      • What actually happened
      • Include specific commands, where these ran, and output messages including all errors.
  • Data security level 3 / FASSE storage is intentionally not included in Globus by default.  If you would like your FASSE project to be exposed through Globus, consider the following:
    • If any data in this project is governed by a contract / data use agreement (DUA), please review the DUA to ensure Globus is compliant.  You might consult your School Security Officer for this.
      • An example scenario where Globus would not be compliant:  DUAs indicating that a VPN or private network must be used for all access to the data.  Globus makes data available over the Internet without a VPN or private network
    • Please submit a ticket to rchelp@rc.fas.harvard.edu as follows:
      • This must include the path to the project to add to Globus (e.g. “/n/piname_project_l3”)
      • This must indicate that the PI attests to Globus being compliant with any contracts/DUAs governing the data in this project storage
      • This must be from, or receive a reply directly from the PI for this project confirming this information
  • For Storage, FASSE storage is intentionally not provided SMB shares by default.  If you need your FASSE project exposed through an SMB share, consider the following:
    • Please submit a ticket to rchelp@rc.fas.harvard.edu as follows:
      • This must include the path to the project (e.g. “/n/piname_project_l3”)
      • This must indicate that the PI attests to understanding and accepting the risks of enabling SMB access to this data, given that any system or network that can talk to this tiered storage, could access this data if the credentials from an account in the project were used.  Some example scenarios:
        • Someone with access to your storage accesses it / copies data down to an unmanaged lab computer without data security level controls
        • Someone with access to your storage accidentally clicks the wrong link on a computer with access to this storage. Their computer is compromised, malware identifies SMB access to your data, and compromises the confidentiality, integrity, and/or availability of your data – maybe ransomware, stealing the data, etc.
      • This must include a brief explanation of why SMB access is needed, and from where you will use this SMB access
      • This must be from, or receive a reply directly from the PI for this project confirming this information

If you have any questions or concerns, please do not hesitate to consult us at at security@rc.fas.harvard.edu, although in some cases we may end up pulling in or pointing you to your school privsec officer.

PI Responsibilities at FAS RC

PI Responsibilities at FAS RC

Overview

PIs have a variety of responsibilities at Harvard University.  This document will cover the responsibilities specific to FAS Research Computing, especially around information security and risk.

PIs are individuals given continuous or limited PI rights by the university and whom control their own funding in a school that FAS RC supports. Co-Investigators are not considered PIs.

Responsibilities

  • PIs are responsible for following all applicable Harvard University policies, including but not limited to Harvard Research Data Security Policy and Harvard Information Security Policy, as well as any requirements in data use agreements (DUAs) or contracts that impact them.
  • PIs are responsible for creating and maintaining accurate data documentation in the Harvard Compliance System, as required by University policies, and complying with approved data security and management plans.  Guidance on which applications are needed for your data.
  • PIs are responsible for submitting FASSE project requests for any data security level (DSL) 3 data they plan to use at FAS RC and keeping associated data in the specific FASSE storage provided for these projects.
  • PIs are responsible for informing FAS RC of any changes to Research Administration applications (e.g. DAT12-1234, DUA12-1234, IRB12-1234) governing data they plan to use for their FASSE projects, before moving new data to FAS RC storage for these projects.  This includes informing FASRC before adding data from a new application (e.g. DUA12-1234) to an existing FASSE project.
  • PIs are responsible for ensuring that any access they approve complies with all applicable Harvard University policies and DUA or compliance regimes.  For example, among many other scenarios:
    • If a DUA requires informing or obtaining approval from the data provider before providing access to the data, the PI must ensure this is done before they approve the associated FAS RC access
    • If a DUA states that only Harvard staff may have access to the data, the PI is responsible for ensuring they never approve access to non-Harvard members to that data (e.g. external collaborators)
  • PIs are responsible for informing FAS RC when an account they have sponsored should be disabled (i.e. if they sponsor the account and the person has left or should otherwise be disabled)
  • PIs are responsible for informing FAS RC when any accounts should be removed from groups they manage
  • PIs are responsible for informing FAS RC if and when data needs secure disposal/sanitization, either as required by Harvard University policy or a DUA

Upcoming Responsibilities

  • Coming soon: PIs are responsible for reviewing accounts they sponsor on an annual basis [1]
  • Coming soon: PIs are responsible for reviewing access to groups they manage on an annual basis [1]
[1] If you would like to review spreadsheets of accounts you sponsor and group memberships for groups you approve, please contact rchelp@rc.fas.harvard.edu ask for account and access review spreadsheets.

Open OnDemand (OOD/VDI) Remote Desktop: How to open software

Open OnDemand (OOD/VDI) Remote Desktop: How to open software

Introduction

In this document, you can see how to launch different software in the Open OnDemand (OOD) Remote Desktop app (available at rcood.rc.fas.harvard.edu)

Step 1: Connect to the FASRC VPN (see VPN setup documentation)

Step 2: Launch the Remote Desktop app

Step 3: When the Remote Desktop app opens, click the terminal icon to launch a terminal (or click Applications -> Terminal Emulator).

Step 4: Below, you can follow the instructions to launch various software.

Keep in mind that, for the most part, the terminal window must remain open. If the terminal window is closed, the software launched via the terminal will also be closed.

Training Session: FASRC Open On Demand Users Training

Remote Desktop login

To comply with Harvard’s security policy, if the Remote Desktop session becomes idle, the Remote Desktop session will lock. You need to enter your FASRC password to log back in.

Abaqus

In the terminal, type the commands to load the modules and launch Abaqus

[jharvard@holy7c24102 ~]$ module load abaqus
[jharvard@holy7c24102 ~]$ export LANG=en_US
[jharvard@holy7c24102 ~]$ abaqus cae -mesa cpus=$SLURM_CPUS_PER_TASK &

You can see all versions of Abaqus with module spider abaqus. For more details, see the modules page.

The Abaqus license is restricted to SEAS. For more information, see our Abaqus docs.

Comsol

In the terminal, type the commands to load the modules and launch Comsol

[jharvard@holy7c24102 ~]$ module load comsol
[jharvard@holy7c24102 ~]$ export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
[jharvard@holy7c24102 ~]$ comsol -3drend sw -np $SLURM_CPUS_PER_TASK &

You can see all versions of Comsol with module spider comsol. For more details, see the modules page.

The Comsol license is restricted to SEAS. For more information, see our Comsol docs.

For how to set the Comsol temporary directory, see our Comsol Troubleshooting doc.

Jupyter Notebook

(optional) Creating and loading a mamba/conda environment

Note: this is a one-time setup to ensure that your conda environment can be loaded in Jupyter Notebook.

See our Python documentation on how to create a conda environment.

Then, in order to see your conda environment in Jupyter Notebook, ensure that you have installed the packages ipykernel and nb_conda_kernels. To do so, launch a terminal in the Remote Desktop and type the commands:

[jharvard@holy7c24102 ~]$ module load python
[jharvard@holy7c24102 ~]$ source activate my_conda_environment
[jharvard@holy7c24102 ~]$ mamba install ipykernel
[jharvard@holy7c24102 ~]$ mamba install nb_conda_kernels

For more information on creating conda environments for TensorFlow and PyTorch, see our GitHub documentation:

You can see all versions of Python with module spider python. For more details, see the modules page.

Launching Jupyter Notebook

In the Remote Desktop terminal, type the commands to load the modules and launch Jupyter Notebook:

[jharvard@holy7c24102 ~]$ module load python
# (optional) load conda environment
[jharvard@holy7c24102 ~]$ source activate my_conda_environment
# launch jupyter notebook
[jharvard@holy7c24102 ~]$ jupyter notebook

After the jupyter notebook command, it may hang for a few seconds. Be patient, a Firefox window will open soon after.

To select my_conda_environment as the kernel, go to Kernel -> Change kernel, and select the kernel (i.e. conda environment) of your choice.

Note: If you prefer to launch Jupyter Lab, note that conda environments cannot be loaded when using Jupyter Lab. Only the base environment is available.

Cleanly close Jupyter Notebook

These are instructions to kill your Jupyter server and so you can exit the job cleanly.

First, close each Jupyter Notebook you have open: click on File -> Close and Halt.

Then, from the Jupyter Notebook Home Page (where you can browse files and folders), on the top right corner, click on “Quit”. Close the Firefox window.

KNIME

In the terminal, type the following commands to load the module and launch Knime.

[jharvard@holy7c24102 ~]$ module load knime
[jharvard@holy7c24102 ~]$ knime &

You can see all versions of KNIME with module spider knime. For more details, see the modules page.

LibreOffice

LibreOffice is a free and open source suite that is compatible with a wide range of formats, including those from Microsoft Word (.doc, .docx), Excel (.xls, .xlsx), PowerPoint (.ppt, .pptx) and Publisher.

LibreOffice is available in the FASRC cluster (both Cannon and FASSE) through a Singularity image. Therefore, LibreOffice is only available through the Remote Desktop app. LibreOffice does not work in the Containerized Remote Desktop app.

In the terminal type the commands to pull and create a singularity image with LibreOffice installed within the container. This command is only needed once.

[jharvard@holy7c24102 ~]$ singularity pull docker://linuxserver/libreoffice

To launch LibreOffice, in the terminal, run the command

[jharvard@holy7c24102 ~]$ singularity exec --cleanenv --env DISPLAY=$DISPLAY libreoffice_latest.sif soffice

Lumerical

In the terminal, type the commands to load the modules and launch Lumerical

[jharvard@holy7c24102 ~]$ module load lumerical-seas
[jharvard@holy7c24102 ~]$ launcher

The Lumerical license is restricted to SEAS. For more information, see our Lumerical docs.

You can see all versions of Lumerical with module spider lumerical. For more details, see the modules page.

Mathematica

In the terminal, type the commands to load the modules and launch Mathematica

[jharvard@holy7c24102 ~]$ module load mathematica
[jharvard@holy7c24102 ~]$ mathematica

You can see all versions of Mathematica with module spider mathematica. For more details, see the modules page.

Matlab

In the terminal, type the commands to load the modules and launch Matlab

[jharvard@holy7c24102 ~]$ module load matlab
[jharvard@holy7c24102 ~]$ matlab -desktop -softwareopengl

You can see all versions of Matlab with module spider matlab . For more details, see the modules page.

MOE

In the terminal, type the commands to load the modules and launch MOE

[jharvard@holy7c24102 ~]$ module load moe
[jharvard@holy7c24102 ~]$ moe

You can see all versions of MOE with module spider moe . For more details, see the modules page.

MOE databases

FASRC has MOE databases available in two locations:

  1. Most of the MOE Auxiliary Databases are available to everyone with cluster access in /n/holylabs/rc_admin/Everyone/moe_databases:
  2. Databases are also available in the $MOE/project folder. You can open them in File -> Open -> Type in the address bar $MOE/project.

RStudio Desktop

In the terminal, type the commands to load modules

[jharvard@holy7c24102 ~]$ module load R
[jharvard@holy7c24102 ~]$ module load rstudio

Set environmental variables

[jharvard@holy7c24102 ~]$ unset R_LIBS_SITE
[jharvard@holy7c24102 ~]$ mkdir -p $HOME/apps/R_version
[jharvard@holy7c24102 ~]$ export R_LIBS_USER=$HOME/apps/R_version:$R_LIBS_USER

Launch RStudio Desktop

[jharvard@holy7c24102 ~]$ rstudio

# vanilla option (combines --no-save, --no-restore, --no-site-file, --no-init-file and --no-environ)
[jharvard@holy7c24102 ~]$ rstudio --vanila

You can see all versions of R and RStudio with module spider R and module spider rstudio, respectively. For more details, see the modules page.

Remoteviz Partition

If you have used the “FAS-RC Remote Visualization” Open OnDemand (or VDI) app, we have decommissioned it.

SageMath

You can use sage wither in a interactive shell using command line interface or by launching a Jupyter Notebook with the SageMath kernel. To launch a Jupyter Notebook, in the terminal, type the commands to load the modules and launch Jupyter

[jharvard@holy7c24102 ~]$ module load sage
[jharvard@holy7c24102 ~]$ sage -n jupyter

Ensure that you have “SageMath” kernel selected. If not, go to Kernel -> Change kernel, and select SageMath.

For example, see Sage documentation:

You can see all versions of SageMath with module spider sage. For more details, see the modules page.

SAS

In the terminal, type the commands to load the modules and launch SAS

[jharvard@holy7c24102 ~]$ module load sas
[jharvard@holy7c24102 ~]$ sas &

Stata

In the terminal, type the commands to load the module and launch Stata

[jharvard@holy7c24102 ~]$ module load stata/17.0-fasrc01

# if you are using single-core jobs
[jharvard@holy7c24102 ~]$ xstata-se

# if you are using multi-core jobs
[jharvard@holy7c24102 ~]$ xstata-mp "set processors $SLURM_CPUS_PER_TASK"

TensorBoard

For TensorBoard, you will first need to create a conda environment (Step 1). You only need to create a conda environment once. If you have created one, you can skip to Step 2. Or, if you have your own environment, make sure you install the TensorBoard package, and then you can skip to Step 2.

Step 1: Create conda environment

In a terminal, load Mambaforge or Python module, create a mamba environment, activate it, and install TensorBoard inside the mamba environment

[jharvard@holy7c24102 ~]$ module load python
[jharvard@holy7c24102 ~]$ module load cuda/11.7.1-fasrc01
[jharvard@holy7c24102 ~]$ module load cudnn/8.5.0.96_cuda11-fasrc01
[jharvard@holy7c24102 ~]$ conda create -n tb_tf2.10_cuda11 python=3.10 pip numpy six wheel scipy pandas matplotlib seaborn h5py jupyterlab
[jharvard@holy7c24102 ~]$ source activate tb_tf2.10_cuda11
[jharvard@holy7c24102 ~]$ conda install -c conda-forge tensorboard
[jharvard@holy7c24102 ~]$ conda install -c conda-forge tensorflow

You can see different versions of Mambaforge or Python in our modules page.

Step 2: Activate conda environment and launch TensorBoard

In a terminal, setup variables for TensorBoard. Make sure that the data you need to visualize in Tensorboard is located in the log directory MY_TB_LOGDIR. You can either use the suggested path below or use somewhere else that better suits your workflow.

# Find available port to run server on (does not output anything to screen)
[jharvard@holy7c24102 ~]$ for myport in {6818..11845}; do ! nc -z localhost ${myport} && break; done

# setup tensorboard environmental variables
[jharvard@holy7c24102 ~]$ export MY_TB_PORT=${myport}
[jharvard@holy7c24102 ~]$ export MY_TB_BASEURL=/node/${host}/${myport}/
[jharvard@holy7c24102 ~]$ export MY_TB_LOGDIR=$HOME/.tensorboard/log/$SLURM_JOBID
[jharvard@holy7c24102 ~]$ mkdir -p $MY_TB_LOGDIR

# load module, activate conda environment, and launch tensorboard
[jharvard@holy7c24102 ~]$ module load python
[jharvard@holy7c24102 ~]$ module load cuda/11.7.1-fasrc01
[jharvard@holy7c24102 ~]$ module load cudnn/8.5.0.96_cuda11-fasrc01
[jharvard@holy7c24102 ~]$ source activate tb_tf2.10_cuda11 
(tb_tf2.10_cuda11) tensorboard --host localhost --port ${MY_TB_PORT} --logdir ${MY_TB_LOGDIR} --path_prefix ${MY_TB_BASEURL}

You can see different versions of Mambaforge or Python in our modules page.

Right-click on the link that starts with “http://localhost” and click on “Open Link”. This will open a Firefox browser, where you can view your results.

Example

Using the environment created in Step 1, run the small program tb_test.py in a directory of your choice and visualize its results.

Source code of tb_test.py:

import os
import tensorflow as tf
import datetime

def create_model():
    return tf.keras.models.Sequential([
        tf.keras.layers.Flatten(input_shape=(28, 28)),
        tf.keras.layers.Dense(512, activation='relu'),
        tf.keras.layers.Dropout(0.2),
        tf.keras.layers.Dense(10, activation='softmax')
    ])

mnist = tf.keras.datasets.mnist
(x_train, y_train),(x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

model = create_model()
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])

logdir = os.getenv('MY_TB_LOGDIR')
print(logdir)

tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=logdir, histogram_freq=1)
model.fit(x=x_train, 
          y=y_train, 
          epochs=5, 
          validation_data=(x_test, y_test), 
          callbacks=[tensorboard_callback])

Setup variables and run tb_test.py

# Find available port to run server on (does not output anything to screen)
[jharvard@holy7c24102 tb_example]$ for myport in {6818..11845}; do ! nc -z localhost ${myport} && break; done

# go to the directory that you have your tb_test.py file
[jharvard@holy7c24102 ~]$ cd tb_example

# setup tensorboard environmental variables
[jharvard@holy7c24102 tb_example]$ export MY_TB_PORT=${myport}
[jharvard@holy7c24102 tb_example]$ export MY_TB_BASEURL=/node/${host}/${myport}/

# this command will set MY_TB_LOGDIR to your current working directory
[jharvard@holy7c24102 tb_example]$ export MY_TB_LOGDIR=$PWD

# load modules and activate conda environment
[jharvard@holy7c24102 tb_example]$ module load python
[jharvard@holy7c24102 tb_example]$ module load cuda/11.7.1-fasrc01
[jharvard@holy7c24102 tb_example]$ module load cudnn/8.5.0.96_cuda11-fasrc01
[jharvard@holy7c24102 tb_example]$ source activate tb_tf2.10_cuda11

# run python code
(tb_tf2.10_cuda11) python tb_test.py

# launch tensorboard
(tb_tf2.10_cuda11) tensorboard --host localhost --port ${MY_TB_PORT} --logdir ${MY_TB_LOGDIR} --path_prefix ${MY_TB_BASEURL}

Right click on the link that starts with “http://localhost” and click on “Open Link”. This will open a Firefox browser where you will be able to see your results.

TotalView

TotalView is a debugging tool particularly suitable for parallel applications. The modules you need to load depend on the compilers used in the code you are trying to debug. Due to this compiler dependency, we refer you to a more elaborate TotalView documentation.

Visual Studio Code

In the terminal, type the commands to load the modules and launch Visual Studio Code

[jharvard@holy7c24102 ~]$ module load vscode
[jharvard@holy7c24102 ~]$ code --user-data-dir $HOME/.vscode/data/ &

You can see all versions of Visual Studio Code with module spider vscode. For more details, see the modules page.

© The President and Fellows of Harvard College.
Except where otherwise noted, this content is licensed under Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International license.