Getting Started – FASRC DOCS https://docs.rc.fas.harvard.edu Thu, 27 Feb 2025 18:42:19 +0000 en-US hourly 1 https://wordpress.org/?v=6.8.1 https://docs.rc.fas.harvard.edu/wp-content/uploads/2018/08/fasrc_64x64.png Getting Started – FASRC DOCS https://docs.rc.fas.harvard.edu 32 32 172380571 PI Responsibilities at FAS RC https://docs.rc.fas.harvard.edu/kb/pi-responsibilities-at-fas-rc/ Tue, 23 May 2023 15:21:18 +0000 https://docs.rc.fas.harvard.edu/?post_type=epkb_post_type_1&p=26221 Overview

PIs have a variety of responsibilities at Harvard University.  This document will cover the responsibilities specific to FAS Research Computing, especially around information security and risk.

PIs are individuals given continuous or limited PI rights by the university and whom control their own funding in a school that FAS RC supports. Co-Investigators are not considered PIs.

Responsibilities

  • PIs are responsible for following all applicable Harvard University policies, including but not limited to Harvard Research Data Security Policy and Harvard Information Security Policy, as well as any requirements in data use agreements (DUAs) or contracts that impact them.
  • PIs are responsible for creating and maintaining accurate data documentation in the Harvard Compliance System, as required by University policies, and complying with approved data security and management plans.  Guidance on which applications are needed for your data.
  • PIs are responsible for submitting FASSE project requests for any data security level (DSL) 3 data they plan to use at FAS RC and keeping associated data in the specific FASSE storage provided for these projects.
  • PIs are responsible for informing FAS RC of any changes to Research Administration applications (e.g. DAT12-1234, DUA12-1234, IRB12-1234) governing data they plan to use for their FASSE projects, before moving new data to FAS RC storage for these projects.  This includes informing FASRC before adding data from a new application (e.g. DUA12-1234) to an existing FASSE project.
  • PIs are responsible for ensuring that any access they approve complies with all applicable Harvard University policies and DUA or compliance regimes.  For example, among many other scenarios:
    • If a DUA requires informing or obtaining approval from the data provider before providing access to the data, the PI must ensure this is done before they approve the associated FAS RC access
    • If a DUA states that only Harvard staff may have access to the data, the PI is responsible for ensuring they never approve access to non-Harvard members to that data (e.g. external collaborators)
  • PIs are responsible for informing FAS RC when an account they have sponsored should be disabled (i.e. if they sponsor the account and the person has left or should otherwise be disabled)
  • PIs are responsible for informing FAS RC when any accounts should be removed from groups they manage
  • PIs are responsible for informing FAS RC if and when data needs secure disposal/sanitization, either as required by Harvard University policy or a DUA

Upcoming Responsibilities

  • Coming soon: PIs are responsible for reviewing accounts they sponsor on an annual basis [1]
  • Coming soon: PIs are responsible for reviewing access to groups they manage on an annual basis [1]

If you would like to start receiving spreadsheets of accounts you sponsor and group memberships for groups you approve, please contact security@rc.fas.harvard.edu and ask to be set up for account and access review notifications.  FAS RC will start rolling these out in stages, starting in Summer 2023 with a focus on PIs who have protected data / FASSE projects, and expand to cover other PIs (e.g. that only use Cannon / DSL 2 data) over a period of months to a year.

]]>
26221
Glossary https://docs.rc.fas.harvard.edu/kb/glossary/ Thu, 24 Mar 2022 17:18:56 +0000 https://docs.rc.fas.harvard.edu/?post_type=epkb_post_type_1&p=24893 Research Computing has its own body of terms and concepts which are both common in general High Performance Computing but also Information Technology in general.  Below is a glossary of common nomenclature you will run into and quick definitions of terms.

Allocation

Used variously.

  1. A block of cores, memory, and possibly GPU’s assigned by the scheduler
  2. A block of storage given to a group for use.
  3. The Fairshare granted to a group

Archival

FASRC storage, including tape, is not archival storage.  Tier 2 tape should be considered as merely an offline cold storage version of disk-based storage.  The term ‘archival’ has a very specific meaning and criteria in data storage and management and no FASRC offering meets that definition/criteria.  If you require archival storage of data, please see dataverse or contact  the library system for advice and options.

Cluster

Also synonymous with a supercomputer.  This is a collection of computers (called nodes) that are tied together by a fast network and uniform operating system.

Central Processing Unit (CPU)

This is a microprocessor that acts as the main director and calculator for a node.  These are divided into individual subdivisions called cores.

Aside from the strict definition, CPU can also be used synonymously with Core.

Chipset

This is a basic architecture for a CPU.  Chipsets vary depending on manufacturer.

Cloud Computing

Leveraging a shared block of computers owned and managed by a separate entity that has resiliency and scalability.

Code

A method of giving a series of instructions to a computer.

Command Line Interface (CLI)

Also known as terminal or console. The fundamental method of interacting with computers via direct instructions.

Compiler

Used to convert code into an executable which can run on a computer.

Containers

A method of creating an encapsulated an environment and software that overlaps the current operating system but does not start up a full independent virtual machine.

Core

A fundamental unit of compute.  Each core runs a single instruction of code, aka a process.

Aside from the strict definitions of Core and CPU, sometimes CPU is used interchangably with core.

Datacenter

A location, usually shared, where servers are housed.

Data Mangement

The art of organizing and handling large amounts of information on computers.

Disaster Recovery 

A copy of an entire file system that can be used internally by FASRC in case of system-wide failure.

Distributed Storage

Storage that uses multiple servers to host data.

Embarrassingly Parallel

The simplest form of parallelism.  This involves leveraging the scheduler to run many jobs at once.

Executable

Compiled code which can be run on a computer. Also known as application or binary.

Fairshare

The method by which Slurm adjudicated which groups get priority.

Graphics Processing Unit (GPU)

Originally designed for fast rendering of images (especially for games).  Today GPU’s are often utilized for machine learning due to their ability to process streams of data swiftly.

Group

A block of users who share something in common, typically a PI.

Graphical User Interface (GUI)

Also known as a desktop. This method of interaction with a computer is mediated through mouse clickable images, menus, and icons.

High Performance Computing (HPC)

Synonymous with Supercomputing.  Numerical work which pushes the limits of what computers can do.  Typically involve datacenters, infiniband, water cooling, schedulers, distributed storage, etc.

Host

Node where application/job is run or the user is currently using.

Hypervisor

A server which hosts multiple Virtual Machines.

Infiniband (IB)

A network technology with ultralow latency and high bandwidth.  Use commonly in supercomputing.

Information Technology (IT)

A catchall term for the broad category of things interacting with computers.  Other synonyms include anything with cyber in it.  Research Computing and High Performance Computing are subdisciplines of Information Technology

Input/Output (I/O or IO)

A term referring to reading in data (input) or writing out data (output) to storage. Covers both how much data is being accessed and how many individual files are being used. Is used to gauge the performance of storage.

Job

An individual allocation for a user by the scheduler.

Job Efficiency

A measure of how well the job allocation parameters match what the job actually uses.

Job Optimization

Work done on a code to ensure that a job runs at the maximum speed possible with the least amount of resource used.

Library

A precompiled set of functions that can be used by external applications.

Login Node

A server or virtual machine that is set up for users as an access point to the cluster.

Machine Learning

A misnomer, often used synonymously with Artificial Intelligence (AI). This is a sophisticated method of deriving correlations from empirical data based on how parts of the brain work.  These correlations are then used to predict future behavior or find common patterns.

Maintenance

When some part of the cluster is taken offline so that work can be done to improve or restore service.

Memory

Also known as RAM (Random Access Memory).  This is volatile locations on the node that hold information while a process is running.

Message Passing Interface (MPI)

The industry standard for processes that need to talk between nodes in a parallel fashion.

Network

A method of connecting various computers together.

Node

Synonymous with Server or Blade. An individual block of compute. Typically made up of a CPU, memory, local storage, and sometimes GPU card.

Operating System (OS)

The basic instructions and environment used by a computer.

Parallel

Executing multiple processes at once.  Typical methods include: Embarrassingly Parallel, Threading, MPI

Partition

A block of compute in Slurm that can be used for scheduling jobs.

Primary Investigator (PI)

Typically professors but can include others who have been designated as such by Harvard University.

Priority

The method by which a scheduler determines which job to schedule next.

Process

A single execution of a code with a singular code path.

Proxy

A method of using a bridge system to access an external network location from a secure network.

Queue

Sometimes used synonymously with Partition.  This is the group of jobs which are waiting to execute on the cluster.

Requeue

A method used by the scheduler to reschedule jobs that are preempted by higher priority jobs.

Research Computing (RC)

Is any application of numerical power to investigate how things work.  Generally this is found in academia, though it is used in industry under various names.

Scheduler

An automated process that adjudicates which jobs go where on a cluster.

Scratch

A location on storage that is meant only for temporary data.

Secure Network

A network that is restricted by various methods to permit it to be able to handle sensitive data.

Serial

Running a sequence of tasks in order.

Slurm

An open source scheduler.

Snapshots

Copies of a directory taken at a specific moment in time. They offer labs a self-service recovery option for overwritten or deleted files within the specific time period.

Storage

A location where you can permanently host data.

Threading

A method of breaking up a process over multiple cores that share memory for the sake of parallel execution.

Topology

Can refer to the organization and layout of:

  1. Nodes on a network (i.e. Network Topology).
  2. Cores, CPUs, Memory, and GPU’s on a Node (i.e. Node Topology).
  3. The processes that make up a Job with respect to Network and Node Topology.

User

You!  Also other people using the cluster.  Users can also be created for use by automated processes.

Virtual Desktop Interface (VDI)

A method of exporting graphics and applications to users outside of the normal command line interface.

Virtual Machine

A computer that exists purely in software and is hosted on a hypervisor.

Water Cooling

Uses a liquid medium (usually water) for removing heat from a computer instead of the standard air cooling.

X11

X11 is an older port-forwarding system for displaying graphical from one system to another. We do not recommend the use of X11 as it is slow and un-reliable. Frequent disconnects are not uncommon and window drawing/re-drawing will be very slow. We recommend Open OnDemand (aka OOD or VDI) which will provide more robust interface and is not tied to the quality and speed of your connection. See also: https://docs.rc.fas.harvard.edu/kb/ood-remote-desktop-how-to-open-software/

Test
Hidden content goes here

 

]]>
24893
> Request an FAS Research Computing Account https://docs.rc.fas.harvard.edu/kb/how-do-i-get-a-research-computing-account/ Wed, 29 Jan 2020 22:58:20 +0000 https://docs.rc.fas.harvard.edu/?post_type=epkb_post_type_1&p=22812 Before You Sign Up

Please note that due to the shared nature of the FASRC cluster and its associated storage systems, FERPA blocks for specific individual information cannot be accommodated within the cluster and its associated systems. Information such as, but not limited to, name, username, and lab affiliation may be available to other cluster users. Please contact FASRC if you have questions or concerns.

If you are unsure whether you qualify for an RC account, please see Qualifications and Affiliations.

Detailed information and steps for using the signup tool can be found here.

Please Note: You may have only one RC account. If you need to add cluster access, change labs, or gain membership in a different/additional lab group, please submit a help ticket. Do not sign up for a second account. This is unnecessary and against our account policies.
Also please be aware that we periodically disable accounts that have not been used for a long time. If this happens and you later need to use your account again, please have your sponsor contact us to approve re-enabling, or contact us if you need to change to a new sponsor.

 

The Process

To request an account to access resources operated by Research Computing. (Cluster, Storage, Software Downloads, Workstation access, Instrument sign-up, etc.), you will use the Account Request Tool. Please review the information below before doing so.  (Note: Firefox works best.  Chrome and other browsers sometimes cause errors)

PLEASE NOTE: Do not select FACULTY as your job type is you do not have a Harvard faculty appointment. If you are a researcher with additional rights (fellowship, PI-like rights, funding, etc.), please select STAFF or POSTDOC. Faculty accounts are intended only for those holding an active Harvard faculty member appointment. Please refer to the FAS PI Eligibility Guidlines which include a handy table toward the end.

  • If you would like to see more detailed instructions and screenshots, please visit Account Signup
  • If you need more info on signing up for instrument access during or after your account signup, please see Instrument Sign-up
    If you are signing up for instrument access at HCBI and are from an un-supported school or external vendor, please see HCBI Registration
  • CLUSTER ACCESS: Please note that if you wish to be able to log into the cluster and/or run jobs, you will need to select Cluster Access during sign-up. If you are signing up just for instrument access, do not select Cluster Access. You can add cluster access if you need it later.

Once you’ve submitted the request, the process is:

Internal/Harvard Key

If You Selected: Internal/Using Harvard Key to verify your information and qualifications:

  1. The request is on hold while the PI is asked to approve or reject it.
  2. Once approved, the account is finalized and set up.
  3. Once finalized, you receive an automated email confirmation with your new account information and instructions for setting the password.

External/Non-Harvard Key

If You Selected: External/Not using Harvard Key to verify your information and qualifications:

  1. You must first meets affiliation requirements in order to request an account.
  2. An email is sent to your PI to approve/reject the request.
  3. The request is on hold while the PI is asked to approve or reject it.
  4. Once approved by the PI, FASRC needs to vet and finalize the account during business hours.
  5. Once finalized, you receive an automated email confirmation with your new account information and instructions for setting the password.

Once you’ve received your account creation confirmation email you can then proceed to set up your OpenAuth token and get connected to the FASRC cluster.

The turnaround time is directly related to the PI/Sponsor’s approval of the account. External accounts are sent to the PI/Sponsor for approval first, then reviewed by RC staff during business hours and generally vetted within one business day.

NOTE! If you request “Cluster Use” (the ability to log into the cluster and run jobs on the cluster), attend one of our monthly New User Trainings or watch the inroduction videson on our Quick Start guide.

Account Request Tool

 

Faculty and Non-Faculty PI Signup

To sign up as a PI sponsor on the cluster, the process is the same as above except that, by selecting a PI-type job title, you will not need to select a sponsor. Please note that your eligibility must be vetted by FASRC staff after you sign up, so please allow time for that. Please refer to the FAS PI Eligibility Guidlines

  • FACULTY PI – If you are a Harvard Faculty member with an active and verifiable faculty appointment, select FACULTY for ‘job type’. Doing so will grey out the Sponsor option. You will instead be vetted by FASRC staff.
    Please do not use this option if you are faculty from another university, use EXTERNAL instead.
  • NON-FACULTY PI – If you are not  a faculty member but have been conferred PI rights by the university (Note: Being noted as a PI on a grant is not the same as university PI status), please use the NON-FACULTY PI job title. Doing so will grey out the Sponsor option. You will instead be vetted by FASRC staff.
    We may need to follow up with you or the university to verify your PI rights and control of budget. If it is not clear that you have PI rights, your request may be rejected. You can contact us with further details if you do have PI rights and this happens.

Once complete, you can sponsor accounts under your FASRC lab group. For more information on that process, see the For Approvers section here.

Account Request Tool

]]>
22812
Command line access with Terminal (login nodes) https://docs.rc.fas.harvard.edu/kb/terminal-access/ Fri, 31 Aug 2018 11:13:08 +0000 https://www.rc.fas.harvard.edu/?page_id=18944 Preface

This document describes how to get access to the cluster from the command line. Once you have that access you will want to go to the Running Jobs page to learn how to interact with the cluster.

Do not run your jobs or heavy applications such as MATLAB or Mathematica on the login server. Please use an interactive session or job for all applications and scripts beyond basic terminals, editors, etc. The login servers are a shared, multi-user resource. For graphical applications please use Open OnDemand.

Please note: If you did not request cluster access when signing up, you will not be able to log into the cluster or login node as you have no home directory. You will simply be asked for your password over and over. See this doc for how to add cluster access as well as additional groups.

A Note On Shells for Advanced Users: The FASRC cluster uses BASH for the global environment. If you wish to use an alternate shell, please be aware that many things will not work as expected and we do not support or troubleshoot shell issues. We strongly encourage you to stick with BASH as your cluster shell. The module system assumes you are using bash.

Login Nodes

When you ssh to the cluster at login.rc.fas.harvard.edu you get connected to one of our login nodes. Login nodes are split between our Boston and Holyoke datacenters. If you want to target a specific datacenter you can specify either boslogin.rc.fas.harvard.edu (Boston) or holylogin.rc.fas.harvard.edu (Holyoke). You can also connect to a specific login node by connecting to a specific host name. Login nodes do not require VPN to access and are accessible worldwide.

Login nodes are your portal into the cluster and are a shared, multi-user resource. As mentioned above, they are not intended for production work but rather as a gateway. Users should submit jobs to the cluster for production work. For interactive work you should spawn an interactive job on a compute node. If you need graphical support we highly recommend using Open OnDemand.

We limit users to 1 core and 4GB of memory per session and a maximum of 5 sessions per user. Users abusing the login nodes may have their login sessions terminated. In order to clear out stale sessions the login nodes are rebooted as part of our monthly maintenance.

If you need more than 5 sessions, consider adapting your workflow to rely more on submitting batch jobs to the cluster rather than interactive sessions, as the cluster is best utilized when users submit work in an asynchronous fashion.  Using Open OnDemand is also a good option as it gives you a traditional desktop on the cluster with ability to open multiple terminals on a dedicated compute node.  There are also tools like screen or tmux which can allow one session to expand to multiple subscreens.

Connecting via SSH

For command line access to the cluster, connect to login.rc.fas.harvard.edu using SSH (Secure SHell). If you are running Linux or Mac OSX, simply open a terminal and type ssh USERNAME@login.rc.fas.harvard.edu, where USERNAME is the name you were assigned when you received your account (example: jharvard – but not jharvard@fasrc, that is only necessary for VPN). If you are on Windows, see below for SSH client options.

Once connected, enter the password you set after receiving your account confirmation email. When prompted for the Verification code, use the current 6-digit OpenAuth token code.

ssh jharvard@login.rc.fas.harvard.edu

An image showing a terminal window logging into login.rc.fas.harvard.edu. The user enters password and openauth code (java openauth token generator shown overlaid on terminal window)

To avoid login issues, always supply your username in the ssh connection as above, since omitting this will cause your local login name at your terminal to be passed to the login nodes.

SSH Clients

MAC/LINUX/UNIX

If you’re using a Mac, the built-in Terminal application (in Applications -> Utilities) is very good, though there are replacements available (e.g. iTerm2).

On Linux distributions, a terminal application is provided by default. For Linux users looking for the iTerm2-like experience, Tilix is popular option.

WINDOWS Clients

If you’re using Windows, you will need to decide what tool to use to SSH to the cluster. Each app behaves differently, but includes some way to enter the server (login.rc.fas.harvard.edu) and select a protocol (SSH). Since there’s no one app and many are used by our community, some suggestions follow.

Terminal

Windows 10+ has ssh built into its standard terminal.

Windows Subsystem for Linux (WSL)

Windows 10+ has the ability to start a miniature Linux environment using your favorite flavor of Linux. From the environment you can use all the normal Linux tools, including ssh. See the Windows Subsystem for Linux documentation for more.

PuTTY

PuTTy is a commonly used terminal tool. After a very simple download and install process, just run PuTTY and enter login.rc.fas.harvard.edu in the Host Name box. Just click the Open button and you will get the familiar password and verification code prompts. PuTTY also supports basic X11 forwarding.

Git BASH

For Windows 10 users Git BASH (part of Git for Windows) is available. It brings not aonly a Git interface, but BASH shell intergration to Windows. You can find more info and download it from gitforwindows.org

MobaXterm

MobaXterm provides numerous remote connection types, including SSH and X11. You can find out more and download it from mobaxterm.mobatek.net There is a free and a paid version and MobaXterm supports X11 forwarding.

XMing (standalone)

XMing is an X11/X Windows application and is a bit more complex. But it’s mentioned here as we do have users who use it for connecting to the cluster. You can find more info at www.straightrunning.com

 

]]>
18944
> User Quick Start Guide https://docs.rc.fas.harvard.edu/kb/quickstart-guide/ Thu, 15 Aug 2013 16:08:12 +0000 https://rc.fas.harvard.edu/?page_id=7836 This guide will provide you with the basic information needed to get up and running on the FASRC cluster with your FARC account. If you’d like more detailed information, each section has a link to more full documentation

LOGIN USERNAME CHEATSHEET (click to expand)

LOGIN USERNAME CHEATSHEET
CLI/Portal/VDI/OoD – For the majority of our services you will log in with your FASRC username and password. Your username was selected at signup. You will set your password using the instructions below (step 2). Your two-factor verification code may also be required.
Example: User John Harvard’s username is jharvard (More info About Usernames)

VPN – When connecting to our VPN you will still use your FASRC username, but you will also need to specify what VPN realm you wish to connect to. Realms provide access to different environments. For instance, the FASSE secure environment uses a special realm ‘@fasse’. Unless you have been told to use a different realm, you will want to use the ‘@fasrc’ realm.
Example: jharvard@fasrc

Email Address – You will not use your email address to log into any FASRC service. The only FASRC system where you will need to enter your email address is when setting or resetting your FASRC account password (see step 2 below).

Harvard Key – Similarly, you will not use your Harvard Key to log into FASRC services. The single exception to this is for account approvers to log in to approve new accounts.

Two-Factor – Also, Harvard Key is not your two-factor code provider. FASRC has its own two-factor authentication. When logging in to FASRC systems you will sometimes be asked for this 6-digit token code. While you can use Duo to store this token, again it is not tied to Harvard Key in any way. Please see OpenAuth below (step 3).

PREREQUISITES

Steps to obtaining and setting up a FASRC account.

1 . Account

1. Get a FASRC account using the account request tool.

Before you can access the cluster you need to request a Research Computing account.

See How Do I Get a Research Computing Account for instructions if you do not yet have an account.

Once you have your FASRC account, move to the Step 2 tab.

2 . Password

2. Set your FASRC Password

You will be unable to login until you set your password via the RC password reset link: https://portal.rc.fas.harvard.edu/p3/pwreset/

You will need to enter the same email address you used to sign up for your account and then will receive an email with a link (this email and link expires after 15 minutes and is for one-time use – it is never needed again).

Once you’ve set your password, click  Step 3 to request your OpenAuth two-factor (2FA) token.

3. OpenAuth

3. Set up OpenAuth for two-factor authentication (2FA)

NOTE: This is not your Harvard Key two-factor code. FASRC has its own two-factor system.

You will need to set up our OpenAuth two-factor authentication (2FA) either on your smartphone (using Google Authenticator, Duo, or similar OTP app) or by downloading our Java applet on your computer.

See the OpenAuth Guide for instructions if you have not yet set up OpenAuth.

For troubleshooting issues you might have, please see our troubleshooting page.

4 . FASRC VPN

4. Use the FASRC VPN when connecting to storage, VDI, or other resources

FASRC has its own VPN service which is separate from other Harvard VPNs you may use.

Harvard users: For access to most FASRC resources, you should generally be able to access them from the FAS or Harvard VPN as well as when on the campus wired network.  However, for some services, such as Open OnDemand (OOD), XDMoD, dash.rc dashboards, and a handful of others will require you to use our VPN. If you cannot reach a website/service that ends in rc.fas.harvard.edu, you should connect to our VPN and try again.

External users: If you are an external, non-Harvard, user you will almost definitely need to connect to our VPN for access to anything other  than SSH to login nodes.

See our FASRC VPN Setup Guide – You will use your FASRC username (plus a realm), your FASRC password, and your OpenAuth token when connecting.

5 . Training

5. Review our introductory training

Watch “Getting Started on the FASRC Cluster – Introduction”
[slides]

FASRC also provides

 

Accessing the Cluster and Cluster Resources

Terminal access

Terminal access via SSH to login.rc.fas.harvard.edu

 If you did not request cluster access when signing up, you will not be able to log into the cluster or login nodes as you have no home directory. You will simply be asked for your password over and over. See this doc for how to add cluster access as well as additional groups.

For command line access to the cluster, you will SSH to login.rc.fas.harvard.edu using your FASRC username, password and OpenAuth token.

See also our SSH/Command Line access documentation for more-detailed instructions: Command line access using a terminal

Watch “Getting Started on FASRC Cluster with CLI (Command Line Interface)”
[slides]

OpenOnDemand (OOD)

Graphical Desktop Access using OpenOnDemand (OOD)

We also provide an interactive graphical desktop environment using Open OnDemand (aka OOD) from which you can launch graphical applications as well as SSH/command line jobs.

Please remember that you must be connected to the FASRC VPN to access this service.

See the following docs for more details: Virtual Desktop through Open OnDemand (OOD)

Watch “Getting Started on FASRC Cluster with Open OnDemand”
[slides]


Storage and Data Management

Research Data Management

There are many policies and procedures related to managing data at Harvard. See our Research Data Management (RDM) page for tools, resources, and guidance to help researchers manage their data effectively and prepare it for sharing, and reuse.

Watch “Research Data Management at FASRC”
[slides]

 

Transferring Data

See our Transferring Data on the Cluster page for best practices for moving data around on the cluster.

There are also graphical tools available. The Filezilla SFTP client is available cross-platform for Mac OSX, Linux, and Windows. See our SFTP file transfer using Filezilla document for more information. Windows users who prefer SCP can download it from WinSCP.net and follow the steps from Connecting with WinSCP to connect to Cannon.

If you are off-campus or behind a firewall and wish to connect to FASRC servers other than the login servers, you should first connect to the Research Computing VPN.

Determine where files will be stored

See our Data Storage Workflow page for an overview of storage ats FASR

Each user of the cluster are granted 100Gb of storage in their home directory. This volume has decent performance and is regularly backed up. For many, this is enough to get going. However, there are a number of other storage locations that are important to consider when running software on the FASRC cluster.

  • /n/netscratch – Our global scratch (environment variable $SCRATCH) is large, high performance temporary VAST filesystem. We recommend that people use this filesystem as their primary job working area, as this area is highly optimized for cluster use. Use this for processing large files, but realize that files will be removed after 90 days and the volume is not backed up. Create your own folder inside the folder of your lab group. If that doesn’t exist, contact RCHelp.
  • Local On-Node Scratch
    /scratch – Local on-node scratch. When running batch jobs (see below), /scratch is a large, very fast temporary store for files created while a tool is running. This space is on the node’s local hard drive. It is a good place for temporary files created while a tool is executing because the disks are local to the node that is performing the computation making access is very fast. However, data is only accessible from that node so you cannot directly retrieve it after calculations are finished. If you use /scratch, make moving any results off and onto another storage system part of your job.
  • Lab Storage
    Lab storage – Each lab that is doing regular work on the cluster can request an initial 4Tb of group accessible storage at no charge. Like home directories, this is a good place for general storage, but it is not high performance and should not be used during I/O intensive processing. See the global scratch above. For additional paid lab storage, see Storage Service Center.

Do NOT use your home directory or lab storage for significant computation.
This degrades performance for everyone on the cluster.
For details on different types of storage and how to obtain more, see the Cluster Storage page.


Running Jobs and Loading Software

For detailed information on running jobs on the FASRC cluster(s), partitions, workflows, and other job details, please see our Running Jobs page.

Information on using software on the cluster can be found on our Software Overview page.


Familiarize yourself with proper decorum on the cluster

The FASRC cluster is a massive system of shared resources. While much effort is made to ensure that you can do your work in relative isolation, some rules must be followed to avoid interfering with other user’s work.

The most important rule on the cluster is to avoid performing computations on the login nodes. Once you’ve logged in, you must either submit a batch processing script or start an interactive session (see Running Jobs). Any significant processing (high memory requirements, long running time, etc.) that is attempted on the login nodes will be killed.

See the full list of Cluster Customs and Responsibilities.


Getting further help

If you have any trouble with running jobs on the cluster, first check the comprehensive Running Jobs page and/or search our documentation. If your questions are not answered there or in any of our other many documentation pages, feel free to submit a help ticket to FASRC. Please include any job ID of the job in question. Also provide us with your  username, what script you ran, the error and output files, and where they’re located as well. The output of module list is often helpful, too.
If you need more hands-on help, FASRC holds weekly Office Hours on Wednesdays from 12pm – 3pm.

]]>
7836