Storage – FASRC DOCS https://docs.rc.fas.harvard.edu Tue, 15 Apr 2025 19:43:47 +0000 en-US hourly 1 https://wordpress.org/?v=6.8.1 https://docs.rc.fas.harvard.edu/wp-content/uploads/2018/08/fasrc_64x64.png Storage – FASRC DOCS https://docs.rc.fas.harvard.edu 32 32 172380571 FASRC Cluster Storage Policy https://docs.rc.fas.harvard.edu/kb/fasrc-cluster-storage-policy/ Tue, 13 Aug 2024 19:46:19 +0000 https://docs.rc.fas.harvard.edu/?post_type=epkb_post_type_1&p=27525 Cluster storage offered and maintained by FASRC should only be used for research taking place on FASRC clusters.

Examples of data that can be stored on FASRC storage are:

  • Datasets
  • Code
  • Scientific software
  • Research results

Examples of data that should not be stored on FASRC storage include:

  • Clerical or lab administrative data
  • Data related to personnel, grant proposals, business operations, or general lab management
  • Data with personally identifiable or financial information 

FASRC storage filesystems are only approved for Data Security Level 1 (DSL1) and  DSL2 research data on the Cannon cluster. DSL3 data must be stored in the approved FASSE cluster project. Research data containing information classified as DSL 4 must be stored on an appropriate storage solution that is approved for DSL4 sensitive data.*

*A limited number of DSL4 projects exist in their own isolated environments

If it comes to the attention of the FASRC Staff that non research related data is being stored on the FASRC systems, we will alert the lab’s PI.

To view alternative storage options for administrative data, please refer to the FASRC website.  Additional information is also provided on the Harvard Security website regarding Data Security levels.

]]>
27525
Administrative Data Storage Options https://docs.rc.fas.harvard.edu/kb/administrative-data/ Tue, 13 Aug 2024 18:48:33 +0000 https://docs.rc.fas.harvard.edu/?post_type=epkb_post_type_1&p=27521 FAS Research Computing offers a wide variety of storage offerings designed to help meet the needs of the Harvard research community. However, storage offerings provided by FASRC are intended to house only research data. For other data types, such as administrative data, general lab documentation, or finance information, we recommend utilizing one or more of the below storage options. While FASRC does not directly support or manage these tools, they are offered with support from Harvard IT.

Google Drive (Administrative file storage)

  • Store, access, and share administrative files from any device, as data is stored in the cloud
  • My Drive is designed for personal documents to be shared individually. 
  • Shared drive is for collaborative files and folders owned at the team level.
  • Approved for Medium Risk Confidential (L3) data. 
  • Central Administration, GSE, HBS, HDS, HKS, and HMS (Quad), and HSPH require local approval for account requests.
  • Google Drive storage request form: This form can be used to request a new personal or shared Google Drive, or request additional storage space for an existing Google Drive. The default Google Shared Drive storage limit is 5 GB. Eligible users may request additional storage using the form. The requests will need to be approved by FAS, as they are currently responsible for the costs.

Dropbox (Administrative file storage)

  • Secure data storage for faculty and research staff
  • Store, sync, and share data files in the cloud
  • Collaborate on documents with added version history

Microsoft 365

  • OneDrive (Administrative file storage)
    • Store work-related administrative files
    • Share with colleagues within and outside of Harvard
    • Users have 2TB of file storage
    • Approved for Medium Risk Confidential (L3) data.
  • Sharepoint (Document management tool)
    • Secure location to store, organize, share, and access information 
    • Default of 25 TBs of expandable storage
    • Approved for Medium Risk Confidential (L3) data.
    • Can be shared externally with non-Harvard faculty and staff. 
    • Storage for Level 4 data is available upon request with restrictions. 

Atlassian Confluence (Wiki web environment)

  • Website allowing users to create, edit, and publish content collaboratively through a web browser. 
  • Access can be given to individuals, groups, to the Harvard community, or to the public. 
  • Each page has its own URL, page history, access restrictions, file attachments, and comments.
]]>
27521
Managing file access with ACLs https://docs.rc.fas.harvard.edu/kb/acls-facls/ Fri, 17 May 2024 15:49:54 +0000 https://docs.rc.fas.harvard.edu/?post_type=epkb_post_type_1&p=27106 What are ACLs?

Access Control Lists (ACLs or FACLs) are used to manage granular permissions on individual files or directories (folder). Primarily they are used to give one or more persons access to a file or directory that is not dependent upon the owner or group attached to the file/folder. For instance, if one user owns a file and wants to allow another user to also have write access to it without giving the group write access.

While FASRC manages a large number of groups for access to the many storage shares we host, we generally do not micromanage access at the individual level such as ‘this person should have access but not this person’ situations. In some instances with multiple users that might be best handled with an additional group, but that is added support overhead. For access issues involving these individual access scenarios, an ACL may be the best option when you need to grant someone else more granular permissions on files you own.

Usage

From a login or other node on the cluster, type man getfacl and man setfacl

 Please Note: Setting ACLs on Tier1 Isilon shares is not supported currently.
Example 1: getfacl

This example shows how to see what FACLs are set.

[harvard_lab]# ls -l
total 12         (the '+' sign indicates that ACLs have been applied)
drwxrwsr-x+ 28 jharvard harvard_lab 4096 Feb 19 20:06 Everyone
drwxrwsr-x+ 7 jharvard harvard_lab 4096 May 9 20:03 Lab
drwxrwsr-x+ 74 jharvard harvard_lab 4096 Oct 10 2023 Users
[harvard_lab]# getfacl .
# file: .      (shows the FACL settings, in this case a group, harvard_lab_admins, has special permissions)
# owner: root
# group: harvard_lab
# flags: -s-
user::rwx
group::r-x
other::r-x
default:user::rwx
default:group::r-x
default:group:harvard_lab_admins:rwx
default:mask::rwx
default:other::r-x

Example 2: setfacl

This example shows how to allow another user read/write/execute access to a file you own.

[jharvard]$ ls -l test
-rw-r--r--. 1 jharvard harvard_lab 30 May 17 16:41 test
[jharvard]$ setfacl -m u:testuser:rwx test
[jharvard]$ ls -l test
-rw-rwxr--+ 1 jharvard harvard_lab 30 May 17 16:41 test
[jharvard]$ getfacl test
# file: test
# owner: jharvard
# group: harvard_lab
user::rw-
user:testuser:rwx
group::r--
mask::rwx
other::r--

Additional documentation about the use of ACLS can be found at:

 

]]>
27106
How to read your Storage Service Center bill https://docs.rc.fas.harvard.edu/kb/storage-service-center-bill/ Thu, 29 Sep 2022 17:39:57 +0000 https://docs.rc.fas.harvard.edu/?post_type=epkb_post_type_1&p=25714 PLEASE NOTE: Billing is NOT  based on usage, but the total allocation. A breakdown by user is provided for those who need to charge back to grants, but the total amount is the same whether the storage is empty or full.

If you have received a monthly storage bill from the FASRC Storage Service Center, this document will help you to understand its contents. Please bear in mind that the bill may show a lot more information than you expected, however other labs may need these details to help them correctly route charges to the correct projects or billing codes.

Storage allocations are billed monthly and are based on the total size of the allocation(s). A lab with a single 16TB allocation on Tier 0, for example, has a total allocation cost of $800 per year [at the time of this writing]. That lab will pay 1/12 of that total amount each month, so approximately $66.66 each month*.

Additionally, the billing system provides a breakdown of usage by users. This includes any user with data in your storage and may include disabled users (see note below). Please note that while this breakdown shows a dollar amount per user and a 33-digit string attached, those values are solely for your information. These values are useful for labs who need to charge back or split costs for projects/grants and/or determine who is using what percentage of their allocation.

Please Note:  FASRC does not police lab storage and, as such, we do not remove data from your storage unless asked to by a PI or lab manager. Each lab is responsible for the management and cleanup of their stored data. If you have difficulty removing items from your storage, please contract FASRC for assistance.

* – We recognize that there will be rounding issues and FASRC will endeavor to round down so that the cost of allocations do not exceed the quoted yearly cost.


EXAMPLE BILL

The example below shows an allocation of 16TB of Tier 0 storage at $50 per TB per year
Total cost per year = $800

$4.16 per TB/month unit cost
Total billed per month= ~ $66.56

PLEASE NOTE: Billing is NOT  based on usage, but the total allocation. A breakdown by user is provided for those who need to charge back to grants, but the total amount is the same whether the storage is empty or full.


RC Storage billing is ready for John Harvard Lab from Research Computing Storage for 7/2022.

This is your monthly bill for FASRC storage service center allocation(s). This bill represents 1/12 of your yearly allocation cost as it currently stands.

  • The cutoff day for billing changes is the 15th of each month. Any changes made to allocation after that will be reflected on the following month’s bill.
  • The monthly charge for RC Storage is based on the storage tier and the total size of the allocation, not the amount of used space. As a result, deleting files will not reduce your monthly cost unless the allocation size is also reduced.
  • In order to be in compliance with grant management rules, the total charge for an allocation is applied to users proportional to how much they have used. That distribution is described in the table below. If there are files owned by users that are no longer affiliated with the lab, it is the lab’s responsibility to remove those files, change their ownership, or move to other storage.
  • For billing questions, to change billing codes, or any other queries, email RC Storage Billing

This link to the fiine system billing records listing can be used to approve records or make adjustments. If you don’t approve or respond within 3 business days, the records will be considered approved
[A LINK TO THIS BILL IN FIINE]

The above link can be used to approve records or make adjustments. If you do not approve or respond within 3 business days, the records will be considered approved. Please respond to RC Storage Billing if you have issues questions.

For a detailed description and how to read your FASRC storage bill, please see: https://docs.rc.fas.harvard.edu/kb/storage-service-center-bill/

Total Monthly Charge  $66.56

Total charge is distributed across users that own files proportional to their usage as shown below.

User Storage Product Account Description Charge

1. John Harvard lustre/tier0 370-XXXXX-XXXX-XXXXX-XXXXXX-XXXX-XXXXX $49.92 for 75% of 16TB of holylfs05/tier0 at $4.16 per TB

2. Jill Harvard lustre/tier0 370-XXXXX-XXXX-XXXXX-XXXXXX-XXXX-XXXXX $16.64 for 25% of 16TB of holylfs05/tier0 at $4.16 per TB

 

 

 

]]>
25714
Home directory full https://docs.rc.fas.harvard.edu/kb/home-directory-full/ Wed, 27 Jul 2022 18:16:48 +0000 https://docs.rc.fas.harvard.edu/?post_type=epkb_post_type_1&p=25383 If you receive an error that your home directory is full (“no space left on device [your home directory path]”)  or an email saying you are over your 100GB home directory quota (and a 95G soft quota that triggers notifications), you will need to remove files to get back under quota. Ordinarily you would just use rm to remove some files and reduce your usage.

However, a situation may arise where you are not at quota but over quota and when trying to remove files using rm, you receive the error

rm: cannot remove ‘{somefilename}’: No space left on device

NOTE: Any time you are deleting files, it is important that you check to ensure you enter the correct filename. A good rule of thumb is to use the full path to a file (instead of relative path) or cd to the directory containing the file first. Also, be extra cautious when using wildcards like * .

 

WORKAROUND

A workaround is to identify some larger file(s) to remove and to reduce that file(s) size to zero bytes. Once enough space is recovered to get you under quota, you should be able to use rm again. To do this on files you’ve identified for removal, use the truncate command:

truncate -s 0 FILENAME
To truncate a single file down to zero bytes.

or

truncate -s 0 FILE1 FILE2 FILE3
To truncate multiple files down to zero bytes

Example:

truncate -s 0 ~/Jobfiles/August/job12653287.out
It’s always safer to use the full path to a file.
~ as used here is a Unix shortcut for the path to your home directory.

Alternately, you could also use the cat command to insert zero bytes of data into a large file:

cat /dev/null > ~/mybig.file

/dev/null is a special Unix device that is always zero bytes in size

CHECKING USED SPACE

You can check to see your current total home directory (shortcut “~”) usage using the du command (plus the summary, total, and human-readable options) like so:

[jharvard@holylogin01 ~]$ du -sch ~
80G .
80G total

If you are on a login node, you can also view your computed quota directly like so:

[jharvard@holylogin01 ~]$ df -h .
Filesystem                      Size Used Avail Use% Mounted on
rcstorenfs:/ifs/rc_homes/home13 95G  80G   15G  9%   /n/home13

This shows that the user jharvard has used 80GB out of 100GB. The 95GB shown as the Size is called a soft quota. That is the threshold at which the system will notify you that you are going over.

Please bear in mind that due to the size of our home directories neither is instantaneous. The notification and re-calculation of quotas happens some time during a 24 hour period. So if you manage to go over quota before the next calculation is done, you won’t receive a soft notice. Similarly, it may take some time for your actual usage and computed quota to match again.

To find which files or directories are using the most space:

[jharvard@holylogin01 ~]$ cd ~
[jharvard@holylogin01 ~]$ du -h --max-depth=1 .
384K ./.config
232K ./Test

2.0G ./spack

...

This will show a listing of files and sizes and you can repeat the command down the directory tree to find files to delete.

What To Do If du and df Are Different

If you find that df says you are at quota while du show a lower number, you may have sparse files which are not being accounted for properly in your du(but are accounted for by the filesystem’s quota check).

To check for this, you will need to use the --apparent-size flag in du to show the logical size and find the culprit(s)

cd ~

du -ch --apparent-size --max-depth=1 .

This will show the logical size of directories and should point you to the cause.

Clearing Disk Space

.local

This hidden folder is located in your $HOME . It typically grows in size when pip install is executed outside of a conda/mamba or python virtual environments to install packages, for example while in a Jupyter or interactive session. See the warning on pip installs. This is because such installations get placed in your ~/.local , resulting in $HOME getting full.

In order to manage ~/.local, do the following:

  1. Make sure that there are no jobs currently running under your profile by executing: squeue -u <username
  2. Rename/turn-off .local folder: mv ~/.local ~/.local.off

.conda

Conda/Mamba environments can be quite bulky based on the number and types of packages installed inside them and should be stored in your PI’s $LAB directory. See Mamba environments in a desired location. However, if such environments are created using the default location, $HOME/.conda, then the storage size of ~/.conda can be managed as follows:

  1. Remove unused packages and clear caches of Conda/Mamba:
    module load python
    source activate <your-environment>
    conda clean --all 
    This deletes only the unused packages in your ~/.conda/pkgs directory.
  2. Remove unused conda/mamba environments:
    module load python
    conda info -e
    conda env remove --name <your-environment>

.cache

This directory, located in ~/.cache, can grow in size with the general use of the cluster, Open OnDemand, or VSCode. In order to manage this space, do the following:

  1. Make sure that there are no jobs currently running under your profile by executing: squeue -u <username
  2. Remove the folder: rm -r ~/.cache

.singularity

This folder, located in ~/.singularity, typically grows in size when a container is pulled to the cluster using Singularity. You can manage the size of this folder by cleaning its corresponding cache: singularity cache clean all

In order to avoid ~/.singularity folder from filling up, you can set a temporary directory while pulling a container and redirect the location for storing its cache. For example: export SINGULARITY_TMPDIR=/tmp/

Then, pull the container using Singularity as usual.

Note: It is best to run the above commands from a compute node, in an interactive session, as login nodes are not performant and are meant for lightweight activities only. One can ignore the interactive session being used for managing $HOME size and move forward with turning off ~/.local or deleting ~/.cache.

]]>
25383
New England Research Cloud (NERC) https://docs.rc.fas.harvard.edu/kb/nerc/ Mon, 24 Jan 2022 21:12:44 +0000 https://docs.rc.fas.harvard.edu/?post_type=epkb_post_type_1&p=24643 NERC, the New England Research Cloud (NERC), is operated by Harvard University Research Computing (URC) and Boston University Research Computing Services groups, and is part of the MOC-Alliance. NERC is a self-service cloud service available to many institutions in New England. Research groups can build out their own virtual machines (OpenStack) and data storage (NESE Ceph). The number and diversity of researchers in need of extensive computational capabilities is expanding, and the type of computation needed is shifting to require tools and elasticity that is best provided by cloud-native technologies rather than or in addition to traditional high-performance computing. 

Users and labs seeking cloud computing resources can and are encouraged to make use of NERC. While FASRC does not directly provide NERC services, FASRC users and labs are free to make use of their services for cloud computing needs.

NERC Links and Documentation

Need help or have questions?

Current NERC users or those with questions about the service should contact NERC via their online help system or by emailing help@nerc.mghpcc.org

]]>
24643
Storage Service Center https://docs.rc.fas.harvard.edu/kb/storage-service-center/ Wed, 10 Nov 2021 14:03:44 +0000 https://docs.rc.fas.harvard.edu/?post_type=epkb_post_type_1&p=24505 This page provides the necessary information for requesting, managing data storage allocations, and billing.  It is essential that you review the Storage Billing FAQ and Data Storage workflow pages.  Below we provide information on the three different software applications we use to help PIs (Coldfront), Finance managers (FIINE), and Lab/Data managers (Starfish) perform their roles.

To get more information about the storage options and their features please visit our Data Storage page. Please feel free to reach out to us at  rchelp@rc.fas.harvard.edu with any questions.

Storage Options and Cost

Active Lab Storage

(Tier 0)

Active Lab Storage

(Tier 1)

Active Lab Storage

(Tier 2)

Long-term Storage

(Tape)

Description: High-performance Lustre Enterprise Isilon NFS Storage Tape
Cost per TB/month (rounded down): $4.16 ($50/yr) $20.83 ($250/yr) $8.33 ($100/yr) $0.416 ($5/yr)
Snapshot: ** No Yes No No
Disaster Recovery: * No Yes Yes No
Available for new allocations: Yes Yes Yes Yes
Maximum size per share: N/A N/A 306 TB 20TB per tape (19.48TB usable)

* Disaster Recovery (aka DR) means that the entire share can be restored in the event of hardware failure or other ‘disaster’. This DRE copy is not accessible to the end user and not suitable for recovering individual files that have been accidentally deleted.

** Snapshot means that a snapshot of the filesystem is periodically taken (for up to 7 days) and can be accessed by the end user to recover individual files that are accidentally deleted.

Important notes:

  • If you require more than approximately 100TB in a single allocation, please contact us first to discuss or drop by Office Hours
  • Billing is through the FIINE system. See below to get access to FIINE
  • Billing is done monthly
  • The cutoff day for billing is the 15th of each month. Any changes done to the allocation will be reflected in next month’s bill
  • The service center needs 33 digit billing code to provide the service. It’s an internal service so we can’t create POs for billing
  • To read your current bill read here
  • For billing questions and queries email billing@rc.fas.harvard.edu

Request or Manage Storage Allocations

  • To request an allocation or manage an existing one, the PI (or a previously designated storage manager) should log into Coldfront
  • If you cannot access Coldfront, or you are a PI who would like to designate a storage manager for your lab, please contact FASRC

Lead Time for New Tape (Tier 3) Allocations

There is a minimum setup time of roughly 2 weeks for new tape allocations. This timeframe assumes we receive the completed tape setup from our service partner NESE without delay. Delays there are beyond our control and could increase lead time. Please note that any storage changes made after the 15th of the month will be reflected in the following month’s billing.

Billing for Allocations

Charges for storage allocations are billed monthly. Expense code(s) can be applied to each allocation and can be sub-divided among multiple billing codes.
See our Service Center FAQ for answers to common questions.

See also How to read your Storage Service Center bill

To manage billing for an existing allocation, you will need:

Instructions for expense code management and billing record review in FIINE are available at:
https://ifx.rc.fas.harvard.edu/docs/user/fiine.html

 

Starfish – Data Management

Starfish – Scans the different storage servers to provide a view, usage details, metadata, and tagging based on the projects. View our Starfish documentation for more details about starfish and examples to query the data.

Coldfront – Lab and Allocation management:

Coldfront – Provides a view on PI projects and allocations. New allocation and updates to existing allocation can be requested using Coldfront. View the Coldfront article for more details about Coldfront and its use.

Fiine -FAS Instrument Invoicing Environment

Fiine – For lab/finance administrator to manage the expense codes per project/user and view invoices.

 

FAQ – Storage Service Center

Since the growth of storage has increased tenfold in the past 5 years, hosting individual small capacity storage server deployments has become unsustainable to manage. These individual server systems do not easily allow for the growth of data share. Due to their small volume, many systems are run above 85% utilization which degrades the performance.

Many systems also run beyond their original maintenance contract, which causes issues in sourcing parts to make repairs; older systems (>5yr) increase the risk for catastrophic data loss. Some systems were purchased by PIs without a provision for backup systems, which has led to confusion of which data shares should have backups. Our prior backup methodology does not scale to these larger systems with hundreds of millions of files. Given these historical reasons, revamping our storage service offerings allows FASRC to maintain the lifecycle of equipment, allowing us to project the overall growth for data capacity, datacenter space, and professional staffing to maintain your research data assets safely.

Prior to the establishment of a Storage Service Center, we only offered a single NFS filesystem for your Lab Share; you now have the choice of four storage offerings to meet your technology needs. The tiers of service clearly define what type of backup your data will have. You only have to pay for an allocation capacity that you need, as opposed to having to guess at the beginning of a server purchase and have this excess go unused.

Over time, you can request an increase to your allocation size. You will receive monthly reports on utilization from each tier to help you plan for future data needs. Some of our tiers will also have web-based data management tools that allow you to query different aspects of your data, tag your data, and see visual representations of your data.

Unlike the compute cluster, where resources are reserved and released, data is allocated to storage long-term. In addition, storage needs across various research domains is drastically different. Therefore, in the FY19 federal rate setting, FAS decided to remove the portion of FASRC dedicated to maintaining storage out of the facilities part of the F&A. This allows FAS to run a Storage Service Center with costs that are allowable on federal awards.
Information about the storage offerings can be found on our Storage Services page and Storage Service Center document . Requests for storage allocations can be made through Coldfront (FASRC VPN required). . Please keep in mind that large requests (>100 TB) might not all be available at the time of request and a smaller increase will be applied as we add more capacity in the coming month.
Yes, you can allocations in different storage options to meet your needs and budget.

We have worked with RAS on two allocation methods to charge data storage to your grants (1) per user allocation method (2) per project allocation method.

Per use allocation method: You will be supplied a usage report by the user for each tier. You can use the % of data associated with this individual as the cost and use the same cost distribution of their % effort on grants.

Example 1: PI has 10 TB allocation on Tier 1 in which researchers John and Jill use. The monthly bill for 10 TB of Tier 1 is $208.30 (at $20.83/TB/mo). The usage report shows that 8 TB total usage where John usages is 60% and Jill is 40%. So data charges associated with John is $124.98 and with Jill is $83.32. John is funded 50% on NFS and 50% on the NIH project thus $62.49 should be allocated to each grant. Jill is funded 100% on NSF project, thus $83.32 should be allocated to her NSF grant.

This method allows faculty to manage their data structures independently to the specific projects as multiple projects will be using some of the same data. Keep in mind that as researchers leave, there needs to be a plan for their data as this data will continue to be reported on in the usage reports.

Per project allocation method: If requested a project specific report, you will have a direct mapping of data used by this project and can allocate this full cost to the cost distribution from grants.

Example 2: PI requests new 5 TB allocation on Tier 1 for NSF funded project. 10 users share this data. The monthly bill would include Tier 1 of $54.15 (at $20.83/TB/mo). The entire $54.15 would be charged to the NSF grant.

This allows there to be a very straightforward assignment between data and funding source. Reuse of the active parts of this data will need to be assigned to future projects.

Example 3: The above PI also has 100 TB allocation on Tier 0 used for multiple projects with multiple funding sources. The usage report for the Tier 0 would be provided per user as per Example 1 above, and the % effort allocation method would be used for Tier 0, while the Example 2 would be used for the new project on Tier 1.

As is common with other Science Operations Core Facilities, once funding sources have been established for bills, we will continue to direct bill those funds until the PI updates these distributions. For the first few months billing will be manual via email until the new Science Operations LIMS billing system is complete.

We suggest that a data management plan is established at the beginning of a project, so that a full data lifecycle can be mapped to phases of your data. This helps identify data that will need to be kept long-term from the start, as well as helps mitigate data being orphaned when students and postdocs move on. If research data is being used again in a subsequent project, you should allocate funds to carry this data forward to new projects. As per federal regulations, you cannot pay for storage in advance. The Tier 3 tape service provides a location to deposit data longer term (7 years) which can meet many of the funding requirements,

Billing will be handled by Science Operation Core Facilities. You will be billed monthly for the TB allocation of space for each tier. Groups will have 2-3 business days to review the invoices before the charges are assessed via internal billing journals. By default, we will also provide you a usage report by user. A usage report per project can be available by request and is best setup for new projects with new allocations.
It is your and your finance admin's responsibility to update or verify your 33-digit billing code for monthly billing in the FIINE system. If no other billing codes are designated, your start-up fund will be used. We are here to help you navigate these decisions: Contact FASRC

For billing inquiries or issues, please email billing@rc.fas.harvard.edu

For general storage issues, questions, or tier changes, please contact rchelp@rc.fas.harvard.edu

We have moved away from owned servers. Very few exceptions will be made. If circumstances warrant one, the request will be reviewed by the University Research Computing Officer, Sr. Director of Science Operations and Administrative Dean of Science. One possible exception is when storage must be adjacent to an instrument where data collection rates are beyond the capacity of 1 Gbps Ethernet (100 MB/s) for extended periods (days).

We will maintain existing physical servers while under warranty, which is typically 5-6 years from their purchase date. We will need a data migration plan to the appropriate tiers a few months prior to decommissioning the server.

Over FY22 we will be migrating whole filesystems at a time into the storage service center. All new space requests will be allocated on newly deployed storage in one of the Tiers.

Most owned storage servers have already been phased out.

Information about the storage offerings can be found on our Storage Services page and Storage Service Center document .

Requests for storage allocations changes can be made through Coldfront (FASRC VPN required). Select your project after logging in and you will find a "Request Allocation Change" button beside each allocation listing.

We ask that you plan ahead for future needs rather than repeatedly adding small increments. Please limit your allocation change requests to no more than once every 60 days for a particular allocation. We have a cutoff date of 15th for billing, so changes requested after that date will not be reflected on your bill until the next billing cycle.

]]>
24505
Mounting Storage on Desktop or Laptop https://docs.rc.fas.harvard.edu/kb/mounting-storage/ Thu, 09 Aug 2018 15:45:29 +0000 https://www.rc.fas.harvard.edu/?page_id=18739 Some resources can be mounted to your local computer via Samba (aka SMB or CIFS), mainly home directories. But there are a few other shares where this has been deemed necessary and implemented on a share-by-share basis.

Please note that most, file systems, including lab directories on holylabs, cannot be mounted on your desktop. 

  • Scratch – Scratch space (/n/netscratch) cannot be mounted in this manner. It is only available on the cluster. If you need to transfer data to/from scratch, you can use an SFTP or SCP client to connect to the cluster and then change to /n/netscratch/[your lab’s space] . You can also use Globus for large external transfers.
  • Active Lab Storage (Tier 0) and shares whose name begin with holy generally cannot be mounted. If you need to transfer data to/from such shares, you can use an SFTP or SCP client to connect to the cluster and then change to the path of your lab share. You can also use Globus for large external transfers.
  • Active Lab Storage (Tier 1) and (Tier 2) shares can be exported via Samba if a valid need exists. Otherwise, if you need to transfer data to/from such shares, you can use an SFTP or SCP client to connect to the cluster and then change to the path of your lab share . You can also use Globus for large external transfers.
  • Long-term storage (tape) cannot be mounted and is only accessible via Globus.

 

Connect to the VPN

If using wireless connections, cluster storage must be routed through a VPN connection. If on wired connections inside Harvard, the VPN client is not required. If you don’t already have one setup, follow the VPN setup instructions.
NOTE: If you have set up custom DNS on your computer, this may cause issues connecting to shares.

Find the filesystem path (if not known)

If you already know the path, skip to instructions for your operating system below.

Mounting your HOME DIRECTORY

If you have cluster access, you can mount your home directory as a drive. You can figure out the path to your home directory by using ssh to login to the cluster. Use cd ~ to go to your home directory (on a Unix-like system, the ~ character is a shortcut to ‘my home directory’). Then type pwd to show where your home directory resides. [jharvard@boslogin02 ~]$ cd ~ [jharvard@boslogin02 ~]$ pwd /n/home08/jharvard  The home08 is the part we need in this example in order to construct the full path to your home directory.  Since all home directories are mounted from the same server, we don’t need to figure that part out. The path, therefore, that you will need for connecting is the combination of the server name, rcstore.rc.fas.harvard.edu. followed by the word homes to signify that it’s a home directory, the sub-folder your home directory resides in (home08 in this example), and your RC username.

For this example, this would result in:
For Windows \\rcstore.rc.fas.harvard.edu\homes\home08\jharvard
For Mac OSX smb://rcstore.rc.fas.harvard.edu/homes/home08/jharvard

Mounting a LAB SHARE

First, it’s important to note that most lab shares are not mountable. Also, cluster-only filesystems such as scratch (netscratch, holyXXXX, or local scratch) are never mountable. If you need to transfer data to/from such shares, you can use an SFTP or SCP/Rsync, or use Globus for large external transfers.

If you don’t already know the path to your lab’s share and believe it should be mountable, asking a lab-mate or your PI for the path is the quickest option. If your lab-mates do not know and you believe your share is mountable, please contact FASRC.

You can also try and see if your lab’s share is mounted on our Samba cluster using the instructions further down the page, but with one of the following paths:
For Windows \\smbip.rc.fas.harvard.edu\ (browse for your lab’s name, see below)
For Mac OSX smb://rcstore.rc.fas.harvard.edu/ (browse for your lab’s name, see below)

If found, you can use the path shown there to mount your lab’s share.

 

Operating System-Specific Instructions

Macs use Connect to Server

If you’re using a Mac, go to a Finder window (or click on the desktop) and choose Go > Connect to Server from the menu. Mac Connect to Server In the server address box, enter the server and path combination as described above prepended with the smb:// protocol specifier (please note that Macs use “/” where Windows uses of “\”). Using the example information above, the value might be smb://rcstore.rc.fas.harvard.edu/homes/home08/jharvard to mount the home directory of user jharvard. If you are mounting a lab share path, enter that instead (example: smb://smbip.rc.fas.harvard.edu/jharvard_lab). Mount home directory on a Mac If you’ve selected the proper volume, you should get a login prompt. Use your FASRC credentials here. Note that you must include the rc\ domain specifier at the beginning of your user name. Use your Odyssey credentials to connect. Don't forget the `rc\`.

PCs use Map Network Drive

You can connect to shared storage on a Windows PC by using the Map Network Drive button in a file explorer window (click the yellow folder icon in the taskbar).

Select This PC in the left-hand pane.

Click Computer, which will present a drop-down menu, and then from that menu click Map network drive.

In the Map Network Drive utility, select a free Drive letter.
Then enter the combination of fileserver address and path in the Folder field.
For the example, in the home directory described above the path would be \\rcstore.rc.fas.harvard.edu\homes\home08\jharvard.
If you are mounting a lab share path, enter that instead (example: \\smbip.rc.fas.harvard.edu\jharvard_lab).
Make sure Connect using different credentials is checked.
Click Finish to continue.Optional: If you want this drive to reconnect every time you log on to your computer, check Reconnect at sign-in. Just bear in mind that it will not re-connect if you are not on the VPN or your normal campus wired jack.


The reason you must check Connect using different credentials in the Map Network Drive box is that your PC has a local account (and a local ‘domain’) and it will default to that if you do not specify another username and domain. If you don’t select this checkbox and attempt to connect, it will try to authenticate with your local PC information and after three failed attempts will result in a lockout (FYI: Don’t worry, lockouts expire automatically in about 5 minutes).

When you are prompted to Enter Network Credentials, prepend your FASRC username withRC\ to specify you are connecting to the RC domain with your username.
Example: RC\jharvard means ‘Connect to the server and path I entered above as RC domain user jharvard’.

This will prompt you for your password. If instead you get an error message about a read-only filesystem, it could be because mount.cifs is not installed on your system. Using this method, you will need to reissue the command every time you boot your computer.Some users prefer using smbclient to connect to Samba/SMB/CIFS shares. This is an optional package you will need to install on your own.

SFTP/Filezilla 

If you are unable to mount your lab storage using one of the above methods, or your lab’s share is simply not available via Samba, you always have the option of using SFTP. This is especially useful if you need to maintain a different VPN connection and cannot connect to our VPN. SFTP to a login node does not require a FASRC/FAS VPN connection as login nodes use two-factor authentication.

We recommend FileZilla as a reliable, cross-platform SFTP client. Note that SFTP uses SSH and our two-factor authentication, so you will need to ensure you have OpenAuth set up, and that you have have cluster access and a home directory. If you are unsure, SSH to a login node first. If you need to request cluster access, see our doc on adding groups/access.

]]>
18739
Scratch https://docs.rc.fas.harvard.edu/kb/policy-scratch/ Wed, 13 Dec 2017 10:36:16 +0000 https://www.rc.fas.harvard.edu/?page_id=17408 RC maintains a large, shared temporary scratch filesystem for general use for high input/output jobs at /n/netscratch .

Scratch Policy

Each lab is allotted 50TB of scratch space for its use in their jobs. This is temporary high-performance space and files older than 90 days will be deleted through a periodic purge process. This purge can run at any time, especially if scratch is getting full and is also often run at the start of the month during our monthly maintenance period.

There is no charge to labs for netscratch, but please note that it intended as volatile, temporary scratch space for transient data and is not backed up. If your lab has concerns or needs regarding scratch space or usage, please contact FASRC to discuss.

Modifying file times (via touch or other process) when initially placing data in scratch is allowed, however doing so subsequently to avoid deletion is an abuse of the filesystem and will result in administrative action from FASRC. To reiterate, you may initially modify the file date(s) on new data so that it is not in the past, but should not modify it further.  If you have longer-term needs, please contact us to discuss options.


Networked, shared netscratch

The cluster has storage built specifically for high-performance temporary use. You can create your own folder inside the folder of your lab group. If that doesn’t exist or you do not have write access, contact us.

IMPORANT: netscratch is temporary scratch space and has a strict retention policy. 

Size limit 4 Pb total, 50TB max. per group, 100M inodes
Availability All cluster nodes.
Cannot be mounted on desktops/laptops.
Backup NOT backed up
Retention policy 90 day retention policy. Deletions are run during the cluster maintenance window.
Performance High: Appropriate for I/O intensive jobs

 

 

 

 

 

 

/n/netscratch is short-term, volatile, shared scratch space for large data analysis projects.

The /n/netscratch filesystem is managed by the VAST parallel file system and provides excellent performance for HPC environments. This file system can be used for data intensive computation, but must be considered a temporary store. Files are not backed up and will be removed after 90 days. There is a 50TB total usage limit per group.

Large data analysis jobs that would fill your 100 Gb of home space can be run from this volume. Once analysis has been completed, however, data you wish to retain must be moved elsewhere (lab storage, etc.). The retention policy will remove data from scratch storage after 90 days.


Local (per node), shared scratch storage

Each node contains a disk partition, /scratch, also known as the local scratch that is useful for storing large temporary files created while an application is running.

IMPORTANT: Local scratch is highly volatile and should not be expected to persist beyond job duration.

Size limit Variable (200-300GB total typical). See actual limits per partition.
Availability Node only.
Cannot be mounted on desktops/laptops.
Backup Not backed up
Retention policy Not retained – Highly Volatile
Performance High: Suited for limited I/O intensive jobs

 

 

 

 

 

 

The /scratch volumes are a directly connected (and therefore, fast) to temporary storage location that is local to the compute node. Many high performance computing applications use temporary files that go to /tmp by default. On the cluster we have pointed /tmp to /scratch. Network-attached storage, like home directories, is slow compared to disks directly connected to the compute node. If you can direct your application to use /scratch for temporary files, you can gain significant performance improvements and ensure that large files can be supported.

Though there are /scratch directories available to each compute node, they are not the same volume. The storage is specific to the host and is not shared. For details on the /scratch size available on the host belonging to a given partition, see the last column of the table on Slurm Partitions. Files written to /scratch from holy2a18206, for example, are only visible on that host. /scratch should only be used for temporary files written and removed during the running of a process. Although a ‘scratch cleaner’ does run hourly, we ask that at the end of your job you delete the files that you’ve created.

$SCRATCH VARIABLE

A global variable called $SCRATCH exists on the FASRC Cannon and FASSE clusters which allows scripts and jobs to point to a specific directory in scratch regardless of any changes to the name or path of the top-level scratch filesystem. This variable currently points to /n/netscratch so, for example, one could use the path $SCRATCH/jharvard_lab/Lab/jsmith in a job script. This will have the added benefit of allowing us to change scratch systems at any time without your having to modify your jobs/scripts.

]]>
17408
Home and Lab directories https://docs.rc.fas.harvard.edu/kb/cluster-storage/ Mon, 31 Mar 2014 10:16:10 +0000 https://rcwebsite2.rc.fas.harvard.edu/?page_id=10665 Please see the Data Storage on our main website information on other storage options and for clarification on any unfamiliar terms.

This page describes the resources which are available to each user account and lab, and is a guide for day-to-day usage.

See also our Introduction to FASRC Cluster Storage video


Home Directories

Every user whose account has cluster access receives a 100 GB home directory. Your initial working directory upon login is your home directory. This location is for your use in storing everyday data for analysis, scripts, documentation, etc. This is also where files such as  you .bashrc reside. Home directories paths look like /n/homeNN/XXXX where homeNN is home01home15 and XXXX is your login. For example, user jharvard’s home directory might be /n/home12/jharvard. You can also reach your home directory using the Unix shortcut ~, as in: cd ~

  • Size Limit: 100GB (hard limit)
  • Availability: All cluster nodes. Can be mounted on desktops and laptops
  • Backup: Daily snapshots. Retained for 2 weeks
  • Retention policy: indefinite
  • Performance: Moderate. Not appropriate for I/O intensive or large numbers of jobs
  • Cost: Provided with each user account

Your home volume has good performance for most simple tasks. However, I/O intensive or large numbers of jobs should not be processed in home directories. Widespread computation against home directories would result in poor performance for all users. For these types of tasks, the scratch filesystem is better suited.

Home directories are private to your account and will follow you no matter should you change labs, but are not suitable for storing HRCI/level 3 or above data. This is a violation of Harvard security policies.

Your home directory is exported from the disk arrays using CIFS/SMB file protocols and so can be mounted as a ‘shared drive’ on your desktop or laptop. Please see this help document for step-by-step instructions.

Home directories are backed up into a directory called .snapshot in your home. This directory will not appear in directory listings. You can cd or ls this directory specifically to make it visible. Contained herein are copies of your home directory in date specific subdirectories. Hourly, daily, weekly snapshots can be found. To restore older files, simply copy them from the correct .snapshot subdirectory. NOTE: If you delete your entire home directory, you will also delete the snapshots. This is not recoverable.

The 100 GB quota is enforced with a combination of a soft quota warning at 95GB and a hard quota stop at 100 GB. Hitting quota during processing of large data sets can result in file write/read failures or segmentation faults. You can check your usage using the df command: df -h ~ (where ~ is the unix shortcut for ‘home’)

TIP: If you are trying to determine usage, you might try using du -h -d 1 ~ to see the usage by sub-directory, or du -ax . | sort -n -r | head -n 20 to get a sorted list of the top 20 largest.

When attempting to log in when your home directory is over quota, you will often see an error in the .Xauthority file:
/usr/bin/xauth: error in locking authority file .Xauthority Logging into an NX or other virtual service will fail as the service cannot write to your home directory.

When at or over quota, you will need to remove unneeded files. Home directory quotas are global and cannot be increased for individuals. You may be able to use lab or scratch space to assist with copying or moving files from your home directory to free up space.

 


Lab Directories

Each lab which uses the cluster receives a 4 TB lab directory (as of 2022 – these will reside in /n/holylabs/LABS). This location is for each lab group’s use in storing everyday data for analysis, scripts, documentation, etc. Each such lab will have a directory on our high-performance scratch filesystem (see below).

  • Size Limit: 4TB (hard limit), 1 million files
  • Availability: All cluster nodes. Cannot be mounted on desktops and laptops
  • Backup: Highly redundant, no backups
  • Retention policy: Duration of the lab group
  • Performance: Moderate. Not appropriate for I/O intensive or large numbers of jobs
  • Cost: Provided with each lab group

Lab directories have good performance for most simple tasks. However, I/O intensive or large numbers of jobs should not be processed in lab directories. Widespread computation against lab directories would result in poor performance for all users. For these types of tasks, the scratch filesystem is better suited.

This lab directory is owned by the lab’s PI and is intended only to be used for research data on the cluster. research storage should not be used for administrative files and data.

Lab directories are not suitable for storing HRCI/level 3 or above data. This is a violation of Harvard security policies.

The 4 TB quota is enforced with a combination of a soft quota warning and a hard quota stop at 4 TB. Hitting quota during processing of large data sets can result in file write/read failures or segmentation faults. If your lab requires additional storage, see our Data Storage page for a list of available storage options.

]]>
10665