Applications – FASRC DOCS https://docs.rc.fas.harvard.edu Tue, 17 Dec 2024 19:04:41 +0000 en-US hourly 1 https://wordpress.org/?v=6.7.1 https://docs.rc.fas.harvard.edu/wp-content/uploads/2018/08/fasrc_64x64.png Applications – FASRC DOCS https://docs.rc.fas.harvard.edu 32 32 172380571 CryoSPARC https://docs.rc.fas.harvard.edu/kb/cryosparc/ Tue, 06 Aug 2024 19:54:33 +0000 https://docs.rc.fas.harvard.edu/?post_type=epkb_post_type_1&p=27501 Description

CryoSPARC is a closed source, commercially-developed piece software for analyzing single-particle cryoelectron microscopy data. It supports CUDA based GPU-accelerated analysis through the PyCUDA library. It consists of several applications which are bundled in two separate binary packages, termed

  • CryoSPARC Master (cryosparcm)
  • CryoSPARC Worker

The Master package is meant to use relative little compute resources, and at least some sysadmins seem to have decided to allow users to run this directly on login nodes. CryoSPARC Worker can be run on a separate node or the same node, but typically should have access to GPU compute resources. The worker nodes must have password-less SSH access to the master node as well as unfettered TCP on a number of ports. The authoritative list of requirements for installation can be found in the CryoSPARC guide.

In addition to instantiating worker nodes and connecting them to the Master node, CryoSPARC can also be configured with a “Cluster Lane” which submits jobs via the SLURM job scheduler. This is the install strategy described in this document.

CryoSPARC Master Program

The master program is called with the cryosparcm command documented here. The major mechanism for customizing the behavior of cryosparcm is the config file located in cryosparc_master/config.sh. It has the license, path to the MongoDB database, master hostname, and the base tcp port. Ensure these settings in your config.sh file are correct or you will experience errors.

See the User Code repo for examples.

CryoSPARC Operation

At the top level, cryosparcm is really a Supervisor based shell script which manages at least six different applications. For instance, running cryosparcm start will bring up the following applications

  • app (cli)
  • command_core
  • command_rtp
  • command_vis
  • database (MongoDB)
  • webapp

These mostly communicate with one another over TCP. The TCP ports used by each of the component programs are not individually configurable, but the base port to which the user connects is configurable in the cryosparc_master/config.sh. Of note, the hostname of the node running the master application is also typically hardcoded in this config.sh file. However, if this is left unset, it will take the hostname of the machine on which cryosparcm start is called.

Obtaining a Free Academic CryoSPARC License

CryoSPARC is free for academic use. However, it does require a license. You can request a license on the CryoSPARC webpage. The process of requesting a license is described in detailed here.

Installing CryoSPARC on Cannon

We provide a configure script to get up and running with CryoSPARC on Cannon, found in our FASRC/User_Codes git repo. When running CryoSPARC, use a GPU computing node so that the correct CUDA modules are loaded and functioning.

Examples

There are example config files for your environment available in FASRC/User_Codes on git to help you get up and running with CryoSPARC, and some additions for your .bashrc file.

Resources

]]>
27501
AlphaFold https://docs.rc.fas.harvard.edu/kb/alphafold/ Mon, 05 Aug 2024 19:30:04 +0000 https://docs.rc.fas.harvard.edu/?post_type=epkb_post_type_1&p=27454 Description

See Alphafold2 and Alphafold3.

AlphaFold in the FASRC Cannon cluster

Alphafold typically runs within a Docker container. However, Docker containers are not allowed in high performance computing (HPC) systems such as Cannon because Docker requires root/sudo privileges, which poses a security concern in HPC systems.

Instead, we use Singularity containers which was specifically designed for HPC systems.

Singularity images

The AlphaFold singularity images are stored in a cluster-wide location, meaning that individual users do not have to copy the singularity images to use them. Singularity images are located in:

/n/singularity_images/FAS/alphafold/

Each singularity image is tagged with the Alphafold version

[jharvard@holylogin03 ~]$ ls -l /n/singularity_images/FAS/alphafold/
total 17G
-rwxr-xr-x. 1 root root 4.8G May 25 2023 alphafold_2.3.1.sif
-rwxr-xr-x. 1 root root 2.9G May 25 2023 alphafold_2.3.2.sif
-rwxr-xr-x. 1 root root 4.9G Dec 16 11:47 alphafold_3.0.0.sif
-rwxr-xr-x. 1 root root 4.5G Nov 2 2022 alphafold_v2.2.4.sif
-rw-r--r--. 1 root root 817 Dec 5 13:35 readme.txt

Databases

The Alphafold database is stored in a cluster-wide location, meaning that individual users do not have to download the AlphaFold database to run their simulations. The database is stored in SSD as recommended by the developers. Database locations:

Alphafold2

/n/holylfs04-ssd2/LABS/FAS/alphafold_database

Alphafold3

/n/holylfs04-ssd2/LABS/FAS/alphafold_databases/v3.0/

Model parameters

Alphafold3

To run Alphafold3, you must request the model parameters from Google. SeeĀ Obtaining model parameters.

Google will provide a file file_name.bin.zst. Extract with unzstd file.bin.zst. PlaceĀ file.bin in a lab share (do not put in netscratch) — this will be the location of you --model_dir.

Running Alphafold

You will find example scripts in the FARC User_Codes repo.

Resources

]]>
27454