Storage – FASRC DOCS https://docs.rc.fas.harvard.edu Tue, 19 Nov 2024 18:40:35 +0000 en-US hourly 1 https://wordpress.org/?v=6.7.1 https://docs.rc.fas.harvard.edu/wp-content/uploads/2018/08/fasrc_64x64.png Storage – FASRC DOCS https://docs.rc.fas.harvard.edu 32 32 172380571 FASRC Cluster Storage Policy https://docs.rc.fas.harvard.edu/kb/fasrc-cluster-storage-policy/ Tue, 13 Aug 2024 19:46:19 +0000 https://docs.rc.fas.harvard.edu/?post_type=epkb_post_type_1&p=27525 Cluster storage offered and maintained by FASRC should only be used for research taking place on FASRC clusters.

Examples of data that can be stored on FASRC storage are:

  • Datasets
  • Code
  • Scientific software
  • Research results

Examples of data that should not be stored on FASRC storage include:

  • Clerical or lab administrative data
  • Data related to personnel, grant proposals, business operations, or general lab management
  • Data with personally identifiable or financial information 

FASRC storage filesystems are only approved for Data Security Level 1 (DSL1) and  DSL2 research data on the Cannon cluster. DSL3 data must be stored in the approved FASSE cluster project. Research data containing information classified as DSL 4 must be stored on an appropriate storage solution that is approved for DSL4 sensitive data.*

*A limited number of DSL4 projects exist in their own isolated environments

If it comes to the attention of the FASRC Staff that non research related data is being stored on the FASRC systems, we will alert the lab’s PI.

To view alternative storage options for administrative data, please refer to the FASRC website.  Additional information is also provided on the Harvard Security website regarding Data Security levels.

]]>
27525
Administrative Data Storage Options https://docs.rc.fas.harvard.edu/kb/administrative-data/ Tue, 13 Aug 2024 18:48:33 +0000 https://docs.rc.fas.harvard.edu/?post_type=epkb_post_type_1&p=27521 FAS Research Computing offers a wide variety of storage offerings designed to help meet the needs of the Harvard research community. However, storage offerings provided by FASRC are intended to house only research data. For other data types, such as administrative data, general lab documentation, or finance information, we recommend utilizing one or more of the below storage options. While FASRC does not directly support or manage these tools, they are offered with support from Harvard IT.

  • Google Drive (Administrative file storage)
    • Store, access, and share administrative files from any device, as data is stored in the cloud
    • My Drive is designed for personal documents to be shared individually. 
    • Shared drive is for collaborative files and folders owned at the team level.
    • Approved for Medium Risk Confidential (L3) data. 
    • Central Administration, GSE, HBS, HDS, HKS, and HMS (Quad), and HSPH require local approval for account requests.
    • Google Drive storage request form: This form can be used to request a new personal or shared Google Drive, or request additional storage space for an existing Google Drive. The default Google Shared Drive storage limit is 5 GB. Eligible users may request additional storage using the form. The requests will need to be approved by FAS, as they are currently responsible for the costs.
  • Dropbox (Administrative file storage)
    • Secure data storage for faculty and research staff
    • Store, sync, and share data files in the cloud
    • Collaborate on documents with added version history
  • Microsoft 365
    • OneDrive (Administrative file storage)
      • Store work-related administrative files
      • Share with colleagues within and outside of Harvard
      • Users have 2TB of file storage
      • Approved for Medium Risk Confidential (L3) data.
    • Sharepoint (Document management tool)
      • Secure location to store, organize, share, and access information 
      • Default of 25 TBs of expandable storage
      • Approved for Medium Risk Confidential (L3) data.
      • Can be shared externally with non-Harvard faculty and staff. 
      • Storage for Level 4 data is available upon request with restrictions. 
  • Atlassian Confluence (Wiki web environment)
    • Website allowing users to create, edit, and publish content collaboratively through a web browser. 
    • Access can be given to individuals, groups, to the Harvard community, or to the public. 
    • Each page has its own URL, page history, access restrictions, file attachments, and comments.
]]>
27521
Managing file access with ACLs https://docs.rc.fas.harvard.edu/kb/acls-facls/ Fri, 17 May 2024 15:49:54 +0000 https://docs.rc.fas.harvard.edu/?post_type=epkb_post_type_1&p=27106 What are ACLs?

Access Control Lists (ACLs or FACLs) are used to manage granular permissions on individual files or directories (folder). Primarily they are used to give one or more persons access to a file or directory that is not dependent upon the owner or group attached to the file/folder. For instance, if one user owns a file and wants to allow another user to also have write access to it without giving the group write access.

While FASRC manages a large number of groups for access to the many storage shares we host, we generally do not micromanage access at the individual level such as ‘this person should have access but not this person’ situations. In some instances with multiple users that might be best handled with an additional group, but that is added support overhead. For access issues involving these individual access scenarios, an ACL may be the best option when you need to grant someone else more granular permissions on files you own.

Usage

From a login or other node on the cluster, type man getfacl and man setfacl

 Please Note: Setting ACLs on Tier1 Isilon shares is not supported currently.
Example 1: getfacl

This example shows how to see what FACLs are set.

[harvard_lab]# ls -l
total 12         (the '+' sign indicates that ACLs have been applied)
drwxrwsr-x+ 28 jharvard harvard_lab 4096 Feb 19 20:06 Everyone
drwxrwsr-x+ 7 jharvard harvard_lab 4096 May 9 20:03 Lab
drwxrwsr-x+ 74 jharvard harvard_lab 4096 Oct 10 2023 Users
[harvard_lab]# getfacl .
# file: .      (shows the FACL settings, in this case a group, harvard_lab_admins, has special permissions)
# owner: root
# group: harvard_lab
# flags: -s-
user::rwx
group::r-x
other::r-x
default:user::rwx
default:group::r-x
default:group:harvard_lab_admins:rwx
default:mask::rwx
default:other::r-x

Example 2: setfacl

This example shows how to allow another user read/write/execute access to a file you own.

[jharvard]$ ls -l test
-rw-r--r--. 1 jharvard harvard_lab 30 May 17 16:41 test
[jharvard]$ setfacl -m u:testuser:rwx test
[jharvard]$ ls -l test
-rw-rwxr--+ 1 jharvard harvard_lab 30 May 17 16:41 test
[jharvard]$ getfacl test
# file: test
# owner: jharvard
# group: harvard_lab
user::rw-
user:testuser:rwx
group::r--
mask::rwx
other::r--

Additional documentation about the use of ACLS can be found at:

 

]]>
27106
How to read your Storage Service Center bill https://docs.rc.fas.harvard.edu/kb/storage-service-center-bill/ Thu, 29 Sep 2022 17:39:57 +0000 https://docs.rc.fas.harvard.edu/?post_type=epkb_post_type_1&p=25714 PLEASE NOTE: Billing is NOT  based on usage, but the total allocation. A breakdown by user is provided for those who need to charge back to grants, but the total amount is the same whether the storage is empty or full.

If you have received a monthly storage bill from the FASRC Storage Service Center, this document will help you to understand its contents. Please bear in mind that the bill may show a lot more information than you expected, however other labs may need these details to help them correctly route charges to the correct projects or billing codes.

Storage allocations are billed monthly and are based on the total size of the allocation(s). A lab with a single 16TB allocation on Tier 0, for example, has a total allocation cost of $800 per year [at the time of this writing]. That lab will pay 1/12 of that total amount each month, so approximately $66.66 each month*.

Additionally, the billing system provides a breakdown of usage by users. This includes any user with data in your storage and may include disabled users (see note below). Please note that while this breakdown shows a dollar amount per user and a 33-digit string attached, those values are solely for your information. These values are useful for labs who need to charge back or split costs for projects/grants and/or determine who is using what percentage of their allocation.

Please Note:  FASRC does not police lab storage and, as such, we do not remove data from your storage unless asked to by a PI or lab manager. Each lab is responsible for the management and cleanup of their stored data. If you have difficulty removing items from your storage, please contract FASRC for assistance.

* – We recognize that there will be rounding issues and FASRC will endeavor to round down so that the cost of allocations do not exceed the quoted yearly cost.


EXAMPLE BILL

The example below shows an allocation of 16TB of Tier 0 storage at $50 per TB per year
Total cost per year = $800

$4.16 per TB/month unit cost
Total billed per month= ~ $66.56

PLEASE NOTE: Billing is NOT  based on usage, but the total allocation. A breakdown by user is provided for those who need to charge back to grants, but the total amount is the same whether the storage is empty or full.


RC Storage billing is ready for John Harvard Lab from Research Computing Storage for 7/2022.

This is your monthly bill for FASRC storage service center allocation(s). This bill represents 1/12 of your yearly allocation cost as it currently stands.

  • The cutoff day for billing changes is the 15th of each month. Any changes made to allocation after that will be reflected on the following month’s bill.
  • The monthly charge for RC Storage is based on the storage tier and the total size of the allocation, not the amount of used space. As a result, deleting files will not reduce your monthly cost unless the allocation size is also reduced.
  • In order to be in compliance with grant management rules, the total charge for an allocation is applied to users proportional to how much they have used. That distribution is described in the table below. If there are files owned by users that are no longer affiliated with the lab, it is the lab’s responsibility to remove those files, change their ownership, or move to other storage.
  • For billing questions, to change billing codes, or any other queries, email RC Storage Billing

This link to the fiine system billing records listing can be used to approve records or make adjustments. If you don’t approve or respond within 3 business days, the records will be considered approved
[A LINK TO THIS BILL IN FIINE]

The above link can be used to approve records or make adjustments. If you do not approve or respond within 3 business days, the records will be considered approved. Please respond to RC Storage Billing if you have issues questions.

For a detailed description and how to read your FASRC storage bill, please see: https://docs.rc.fas.harvard.edu/kb/storage-service-center-bill/

Total Monthly Charge  $66.56

Total charge is distributed across users that own files proportional to their usage as shown below.

User Storage Product Account Description Charge

1. John Harvard lustre/tier0 370-XXXXX-XXXX-XXXXX-XXXXXX-XXXX-XXXXX $49.92 for 75% of 16TB of holylfs05/tier0 at $4.16 per TB

2. Jill Harvard lustre/tier0 370-XXXXX-XXXX-XXXXX-XXXXXX-XXXX-XXXXX $16.64 for 25% of 16TB of holylfs05/tier0 at $4.16 per TB

 

 

 

]]>
25714
Home directory full https://docs.rc.fas.harvard.edu/kb/home-directory-full/ Wed, 27 Jul 2022 18:16:48 +0000 https://docs.rc.fas.harvard.edu/?post_type=epkb_post_type_1&p=25383 If you receive an error that your home directory is full (“no space left on device [your home directory path]”)  or an email saying you are over your 100GB home directory quota (and a 95G soft quota that triggers notifications), you will need to remove files to get back under quota. Ordinarily you would just use rm to remove some files and reduce your usage.

However, a situation may arise where you are not at quota but over quota and when trying to remove files using rm, you receive the error

rm: cannot remove ‘{somefilename}’: No space left on device

 

NOTE: Any time you are deleting files, it is important that you check to ensure you enter the correct filename. A good rule of thumb is to use the full path to a file (instead of relative path) or cd to the directory containing the file first. Also, be extra cautious when using wildcards like * .

 

WORKAROUND

A workaround is to identify some larger file(s) to remove and to reduce that file(s) size to zero bytes. Once enough space is recovered to get you under quota, you should be able to use rm again. To do this on files you’ve identified for removal, use the truncate command:

truncate -s 0 FILENAME
To truncate a single file down to zero bytes.

or

truncate -s 0 FILE1 FILE2 FILE3
To truncate multiple files down to zero bytes

Example:

truncate -s 0 ~/Jobfiles/August/job12653287.out
It’s always safer to use the full path to a file.
~ as used here is a Unix shortcut for the path to your home directory.

Alternately, you could also use the cat command to insert zero bytes of data into a large file:

cat /dev/null > ~/mybig.file

/dev/null is a special Unix device that is always zero bytes in size

CHECKING USED SPACE

You can check to see your current total home directory (shirtcut “~”) usage using the du command (plus the summary, total, and human-readable options) like so:

[jharvard@holylogin01 ~]$ du -sch ~
80G .
80G total

If you are on a login node, you can also view your computed quota directly like so:

[jharvard@holylogin01 ~]$ df -h .
Filesystem                      Size Used Avail Use% Mounted on
rcstorenfs:/ifs/rc_homes/home13 95G  80G   15G  9%   /n/home13

This shows that the user jharvard has used 80GB out of 100GB. The 95GB shown as the Size is called a soft quota. That is the threshold at which the system will notify you that you are going over.

Please bear in mind that due to the size of our home directories neither is instantaneous. The notification and re-calculation of quotas happens some time during a 24 hour period. So if you manage to go over quota before the next calculation is done, you won’t receive a soft notice. Similarly, it may take some time for your actual usage and computed quota to match again.

To find which files or directories are using the most space:

[jharvard@holylogin01 ~]$ cd ~
[jharvard@holylogin01 ~]$ du -h --max-depth=1 .
384K ./.config
232K ./Test

2.0G ./spack

...

This will show a listing of files and sizes and you can repeat the command down the directory tree to find files to delete.

]]>
25383
New England Research Cloud (NERC) https://docs.rc.fas.harvard.edu/kb/nerc/ Mon, 24 Jan 2022 21:12:44 +0000 https://docs.rc.fas.harvard.edu/?post_type=epkb_post_type_1&p=24643 NERC, the New England Research Cloud (NERC), is operated by Harvard University Research Computing (URC) and Boston University Research Computing Services groups, and is part of the MOC-Alliance. NERC is a self-service cloud service available to many institutions in New England. Research groups can build out their own virtual machines (OpenStack) and data storage (NESE Ceph). The number and diversity of researchers in need of extensive computational capabilities is expanding, and the type of computation needed is shifting to require tools and elasticity that is best provided by cloud-native technologies rather than or in addition to traditional high-performance computing. 

Users and labs seeking cloud computing resources can and are encouraged to make use of NERC. While FASRC does not directly provide NERC services, FASRC users and labs are free to make use of their services for cloud computing needs.

NERC Links and Documentation

Need help or have questions?

Current NERC users or those with questions about the service should contact NERC via their online help system or by emailing help@nerc.mghpcc.org

]]>
24643
Storage Types – Tier 0 https://docs.rc.fas.harvard.edu/kb/storage-tier-0/ Wed, 19 Jan 2022 19:54:27 +0000 https://docs.rc.fas.harvard.edu/?post_type=epkb_post_type_1&p=24557 Technology Description

Tier 0 is built on the Lustre parallel filesystem.  Tier 0 filesystems provide high data redundancy and moderate-to-high performance. Tier 0 also provides Starfish data management web access for most shares, and Globus transfer access for all shares.

This tier is generally performant enough to use for hundreds of clustered compute jobs simultaneously and is typically the class of storage used for general cluster storage shares.

To request a new allocation or update exiting allocations
please refer to the Storage Service Center Document.

Location(s)

Tier 0 servers are located in our Holyoke and Boston datacenters.

Backup/Snapshots

Tier 0 is highly redundant and does not offer snapshots.  If you accidentally delete a file from Tier 0 storage, there is no way to recover it. If you require user-accessible snapshots, please see Tier 1.

Disaster Recovery Information

Tier 0 is highly redundant and does not offer backups. If you accidentally delete files from Tier 0 storage, there is no way to recover it.  If you require disaster recovery backups, please see Tier 1 or Tier 2.

]]>
24557
Storage Types – Tier 3 https://docs.rc.fas.harvard.edu/kb/storage-tier-3/ Wed, 12 Jan 2022 15:56:40 +0000 https://docs.rc.fas.harvard.edu/?post_type=epkb_post_type_1&p=24563  

Technology Description

The NESE Tape System is comprised of a tape library and a disk-based frontend. The tape library itself has 6658 tape slots and 28 IBM TS1160 tape drives (each tape will hold ~20TB).  The IBM Spectrum Scale disk-based frontend buffer storage employs three storage enclosures (with a total of 1.2PB of usable space) and three systems (to manage the storage itself).

To request a new allocation or update exiting allocations
please refer to the Storage Service Center Document.

Location

The NESE Tape System is located in the Massachusetts Green High Performance Center in Holyoke, MA., adjacent to the Cannon compute cluster.

Snapshot or backup schedule

Since the NESE Tape System is not an active datastore used in day to day research, these schedules are not applicable. Tape system users move data in and out by moving data to the frontend disk-based system to have it added to tape and retrieve information by request to the system itself.

DR explanation

Each tape has a copy to greatly reduce the possibility of data loss.

Since the tape library itself is located completely in the MGHPCC, there is currently no formal DR strategy. Arrangements can be made to send tapes offsite to a commercial data storage vendor if desired (enter a ticket at portal.rc.fas.harvard.edu for details).

Usage/recommendations

  • Keep the number files in a single directory to less than 10K if at all possible.
  • Smaller files (less than 1GB) can be problematic for any storage system. If your data has many of these, please tar or zip them up into a larger file before transferring them.

Lead Time

There is a minimum setup time of (TBD – currently 2-3 weeks). This timeframe assumes we receive the completed tape setup from our service partner NESE without delay. Delays there are beyond our control and could increase lead time. Please note that any storage changes made after the 15th of the month will be reflected in the following month’s billing.

Access

Globus

NESE Tape Globus Access and Globus Docs

MinIO/S3

https://docs.min.io/minio/baremetal/reference/minio-mc.html
https://docs.min.io/minio/baremetal/console/minio-console.html


This document is a work-in-progress

 

]]>
24563
Storage Types – Tier 2 https://docs.rc.fas.harvard.edu/kb/storage-tier-2/ Thu, 06 Jan 2022 21:53:56 +0000 https://docs.rc.fas.harvard.edu/?post_type=epkb_post_type_1&p=24603 Technology Description

Tier 2 is built on a standard NFS storage platform and utilizes multiple machines to serve that storage via NFS mounts and Samba.  This technology provides data redundancy and geolocation, multiple access protocols, disaster recovery.  Tier 2 also provides Globus transfer access.

This tier is not intended for high throughput cluster compute jobs.  It is typically used for general file storage, sharing, SMB and NFS.

NOTE: The maximum share size for a Tier 2 share is 306TB.

To request a new allocation or update exiting allocations
please refer to the Storage Service Center Document.

Location(s)

Tier 2 servers are located in our Boston and Holyoke datacenters.  Depending on your workflow, you may want your lab’s Tier 2 storage to be located at one or the other.  FAS Research Computing support staff can discuss with you which datacenter is right for your lab.  See our Data Storage Service page for options.

Snapshots

Tier 2 does not offer user-accessible snapshots.  If you accidentally delete a file from Tier 2 storage, there is no way to directly recover it. If you require user-accessible snapshots, please see Tier 1.

Disaster Recovery Information

Whole filesystem copies of Tier 2 servers are maintained in another data center (Boston shares backed up to Holyoke and vice versa) for disaster recovery purposes.  These are are intended to allow FASRC to restore an entire filesystem/share in the event of catastrophic loss rather than allow access to individual files.

]]>
24603
Storage Types – Tier 1 https://docs.rc.fas.harvard.edu/kb/storage-tier-1/ Wed, 08 Dec 2021 14:57:22 +0000 https://docs.rc.fas.harvard.edu/?post_type=epkb_post_type_1&p=24559

Technology Description

Tier 1 is built on  Dell EMC Isilon clustered storage.  This technology provides high data redundancy, snapshotting,  multiple access protocols, and disaster recovery. Tier 1 also provides Starfish data management web access, and Globus transfer access.

This tier is generally performant enough to use for hundreds of clustered compute jobs simultaneously and is typically the class of storage used for general file sharing, SMB and NFS.

To request a new allocation or update exiting allocations
please refer to the Storage Service Center Document.

Location(s)

Tier 1 servers are located in our Boston and Holyoke datacenters.  Depending on your workflow, you may want your lab’s Tier 1 storage to be located at one or the other.  FAS Research Computing support staff can discuss with you which tier  is right for your lab. See our Data Storage Service page for options.

Backup/Snapshots

Tier 1 offers regular snapshots, taken daily, going back 7 days.  So if you accidentally delete or modify a file, you can revert it back to the state it was in any of the previous 7 days.

Recovering Using a Snapshot

To recover a file from a snapshot, simply navigate to the .snapshot hidden directory in your Tier 1 storage root directory (e.g. /n/jharvard_lab/.snapshot) and find the file in one of the daily snapshot directories.  Once you find the version of the file you want, copy it back to its original location (or an alternate location) in your Tier 1 lab space.  For example:

[jharvard@boslogin01 ~]$ cd /n/jharvard_lab/Lab
[jharvard@boslogin01 Lab]$ ls
002_0216 002_0216_batch2 002_0316 002_0317 002_0318 002_0319 data
[jharvard@boslogin01 Lab]$ cd /n/jharvard_lab/.snapshot
[jharvard@boslogin01 .snapshot]$ ls -l
total 224
drwxr-s--- 4 root jharvard_lab 44 Nov 15 14:23 jharvard_lab_daily_2021-12-02-_00-00
drwxr-s--- 4 root jharvard_lab 44 Nov 15 14:23 jharvard_lab_daily_2021-12-03-_00-00
drwxr-s--- 4 root jharvard_lab 44 Nov 15 14:23 jharvard_lab_daily_2021-12-04-_00-00
drwxr-s--- 4 root jharvard_lab 44 Nov 15 14:23 jharvard_lab_daily_2021-12-05-_00-00
drwxr-s--- 4 root jharvard_lab 44 Nov 15 14:23 jharvard_lab_daily_2021-12-06-_00-00
drwxr-s--- 4 root jharvard_lab 44 Nov 15 14:23 jharvard_lab_daily_2021-12-07-_00-00
drwxr-s--- 4 root jharvard_lab 44 Nov 15 14:23 jharvard_lab_daily_2021-12-08-_00-00
[jharvard@boslogin01 .snapshot]$ cd jharvard_lab_daily_2021-12-04-_00-00/
[jharvard@boslogin01 jharvard_lab_daily_2021-12-04-_00-00]$ ls
Lab Users
[jharvard@boslogin01 jharvard_lab_daily_2021-12-04-_00-00]$ cd Lab
[jharvard@boslogin01 Lab]$ ls
002_0216 002_0216_batch2 002_0316 002_0317 002_0318 002_0319 a_R1.fq.gz data
[jharvard@boslogin01 Lab]$ cp a_R1.fq.gz /n/jharvard_lab/Lab
[jharvard@boslogin01 Lab]$ cd /n/jharvard_lab/Lab
[jharvard@boslogin01 Lab]$ ls
002_0216 002_0216_batch2 002_0316 002_0317 002_0318 002_0319 a_R1.fq.gz data
[jharvard@boslogin01 Lab]$

Disaster Recovery Information

Disaster recovery copies are maintained for tier 1 storage. These are different from snapshots in that they are intended to allow FASRC to restore an entire filesystem/share in the event of catastrophic loss rather than allow access to individual files.

]]>
24559