Research Data Management – FASRC DOCS https://docs.rc.fas.harvard.edu Fri, 12 Dec 2025 18:17:43 +0000 en-US hourly 1 https://wordpress.org/?v=6.9 https://docs.rc.fas.harvard.edu/wp-content/uploads/2018/08/fasrc_64x64.png Research Data Management – FASRC DOCS https://docs.rc.fas.harvard.edu 32 32 172380571 Getting Started with FASRC Storage https://docs.rc.fas.harvard.edu/kb/getting-started-with-fasrc-storage/ Tue, 09 Dec 2025 18:29:50 +0000 https://docs.rc.fas.harvard.edu/?post_type=epkb_post_type_1&p=29208 Getting Started with FASRC Storage

FASRC offers two complimentary storage offerings for groups:

  1. Lab Directory: holylabs
    1. path /n/holylabs
    2. 4TB (cannot be expanded)
    3. Retention allowed
    4. no backup (snapshots)
  2. netscratch
    1. path /n/netscratch
    2. 50TB (cannot be expanded)
    3. 90-day retention policy
    4. no backup (snapshots)

For more storage offerings, see the Data Storage Workflow documentation.

FASRC offers paid storage options, including Compute Storage, Lab Storage, and Long-Term storage (see the Data Storage Workflow documentation for more details). Your group may have purchased additional storage, so you could have other folders besides those on holylabs and netscratch.

How do I find my group’s storage folders?

Option 1: Ask a colleague

Typically, the easiest way to find out about your group’s storage is to ask a colleague.

Option 2: Use your group’s documentation

Many groups with FASRC maintain their own documentation. If your lab or group has internal documentation, it may indicate what storage you have access to.

Option 3: Self-service

The good news is you can also find the storage on your own!

  1. Familiarize yourself with one of our data management tools, Starfish, by reading the first three sections (Overview, Login, and Navigation) of the Starfish Zones Data Visualization Tool documentation.
  2. Connect to the FASRC VPN (Harvard VPN or a wired on-campus connection also work). For instructions on how to connect to the FASRC VPN, see the VPN Setup documentation.
  3. Go to the Starfish dashboard from any browser (Chrome and Firefox are preferred).
  4. To log in, use your FASRC username and FASRC password.
    Starfish login with two text boxes. The first text box takes the FASRC username. The second text box takes the FASRC password
  5. Navigate to the group folder
    1. Right-click on the storage you would like to use
    2. Click on “Copy mount path to clipboard”
      starfish_jharvard_lab.png
  6. Now, you have the storage “Path” in your clipboard. To keep it, paste it in a text editor, note-taker, or word processor.
    /n/vast-holylabs/C/jharvard_lab
  7. Very important! Note that the path above contains the letter C before jharvard_lab. The letter C indicates the filesystem is on Cannon cluster. If your group has a group share on FASSE, then you would see F (instead of C) before your group (<PI_lab>) to indicate it’s on the FASSE cluster. You must make two edits: remove vast- and /C. The final path is:
    /n/holylabs/jharvard_lab

How do I access my group storage?

Below, we describe various ways to access your group’s storage folders. You can use one or more methods that work well for you.

Option 1: Command-line

You can ssh to the cluster (see Terminal Access), then use the command cd (change directory) to your group share and the command ls (list) the contents of the folder

$ ssh jharvard@login.rc.fas.harvard.edu
(jharvard@login.rc.fas.harvard.edu) Password:
(jharvard@login.rc.fas.harvard.edu) VerificationCode:

[jharvard@boslogin08 ~]$ cd /n/holylabs/jharvard_lab/
[jharvard@boslogin08 jharvard_lab]$ ls
Everyone Lab
[jharvard@boslogin08 jharvard_lab]$ cd Lab/
[jharvard@boslogin08 Lab]$ ls
alphafold3 conda jharvard software spack

Pros

  • Quick
  • Does not require an additional tool/app

Cons

  • Learning curve for those not familiar with the command line interface
  • No graphical user interface (no point and click, all interactions are done via the keyboard)

Option 2: Filezilla

FileZilla is a free and open-source SFTP (secure file transfer protocol) client. You can find step-by-step instructions on how to use FileZilla in the SFTP file transfer using Filezilla (Mac/Windows/Linux) documentation.

Pros

  • Graphical user interface (GUI)

Cons

  • When you open a file located on the cluster to edit it, it downloads to your local machine, saves it, and then uploads it back to the cluster — this process can sometimes be slow.

Option 3: Open OnDemand (OOD)

  1. Connect to the FASRC VPN (see VPN setup)
  2. Go to the Open OnDemand dashboard
    1. Cannon
    2. FASSE
  3. On the top menu, click “Files” tab
  4. Double-click on the folder that you would like to open
  5. To open a file or edit a file, click on the 3 dots, then select “View” or “Edit”

Pros

  • Graphical user interface (GUI)
  • You may open files and edit them without having to download them to your local machine

Cons

  • If you have a lot of files (hundreds of files) in one folder, OOD may become unresponsive. If this happens, use a different method, as OOD is not designed to browse hundreds or thousands of files. Then, create more folders and spread files into the newly created folders
  • Requires connection to the FASRC VPN

Option 4: Globus

Refer to the Globus File Transfer documentation.

Pros

  • Graphical user interface (GUI)
  • Fast
  • Can also be used to transfer files to your local computer and other institutions that have Globus

Cons

  • May take some time to get familiar with Globus
  • Viewing/opening files is not great
  • Not recommended if transferring files within the same filesystem (e.g., from a folder in holylabs to another folder in holylabs; in this case, use the command mv)

How do I transfer files?

Moving data between storage offerings can be a challenge. There are several commands and tools you can use to move data back and forth.

Within the cluster

Option 1: Command-line interface

For options using the command-line interface, refer to Transferring Data on the Cluster documentation (cp, mv, rsync, fpsync commands)

Option 2: Open OnDemand Remote Desktop

This is a good option if you need to transfer files to/from your home directory because Globus does not have access to home directories.

  1. Go to the Open OnDemand dashboard
    1. Cannon
    2. FASSE
  2. Start a Remote Desktop session (for how to start, see Remote Desktop documentation)
  3. After the Remote Desktop session starts, open a File Manager by clicking on the “Home” folder on the Desktop. (Alternatively, you can use the top left menu: click  “Applications” -> “File Manager”
  4. Navigate to the source folder
  5. Open another File Manager window
  6. Navigate to the destination folder
  7. Drag and drop the files from the source window to the destination window. Below, jharvard transfers the file “globus-example.txt”
    1. Left window (source): /n/home01/jharvard/Globus-from/
    2. Right window (destination) /n/netscratch/jharvard_lab/Lab/Globus-to

Option 3: Globus

Only recommended if you are transferring between different file systems (e.g. holylabs to/from netscratch, group storage to/from netscratch). Globus is not recommended for transfers in the same filesystem (e.g. from a folder in holylabs to a different folder in holylabs).

See example 4 in the Globus File Transfer documentation.

Outside of the cluster

To transfer data outside of the cluster, refer to the Transferring Data Externally documentation. There are different ways in which to transfer data to and from research computing facilities. The appropriate choice will depend on the size of your data, your need to secure it, and also who you wish to share it with.

How do I know how much storage is available?

Option 1: Command-line interface

The FASRC quota command provides storage limit and usage information for all FASRC storage options except Cold Storage (Tape). Additional information can be found on the Checking quota and usage page.

Option 2: Coldfront

You may use Colfront to see your lab’s storage allocation and usage (note that Lustre filesystems, Tier 0, may show different amounts). Refer to the ColdFront User Guide documentation.

Help

Who do I contact if I have questions?

Please email rchelp@rc.fas.harvard.edu if you have any questions!

]]>
29208
FAS RC Research Data Retention and Deletion Policy https://docs.rc.fas.harvard.edu/kb/fas-rc-research-data-retention-and-deletion-policy/ Tue, 02 Dec 2025 15:12:48 +0000 https://docs.rc.fas.harvard.edu/?post_type=epkb_post_type_1&p=29272 Purpose: 

This policy defines FAS RC standards and procedures for the retention and deletion of research data, outputs, temporary files, and associated digital resources managed by the FAS RC in support of research activities. 

Scope: 

This policy applies to all research data stored, processed, or managed on servers, workstations, cloud resources, storage systems, or backup media provisioned by the FAS Research Computing Service Group.

Data Retention: 

Following the departure of faculty from the University, the associated primary department will assume responsibility for the maintenance, storage, and cost of housing the remaining research data.

Home Directories:

Aligning with the University Research Data Security Policy and the Retention and Maintenance of Research Records and Data Frequently Asked Questions (“FAQs”), home directories will be retained for no more than 7 years following a researcher’s departure from the University or the deactivation of their FASRC account. The researcher’s last login to their FASRC account will be used to track compliance. 

Project Data:

Principal Investigators (PIs) should notify FAS RC 60 days prior to their departure from the University including the duration of any appointments (courtesy or associate), with instructions and next steps for remaining datasets. 

For research data associated with completed or inactive research projects and/or departed faculty where no notice has been given to FAS RC as to where the research data should be stored

  1. The PIs Harvard affiliated primary department becomes responsible for the storage and cost of the research data. Closure of the PIs group and project in FAS RC will be used to track compliance. 
  2. The research data will be retained in the source storage directory for 2 years following project completion or inactivity. Completion of a project occurs after: 
    1. final reporting to the research sponsor 
    2. final financial close-out of a sponsored research award segment 
    3. final publication of research results 
    4. cessation of academic or scientific activity on a specific activity on a specific research project, regardless of whether its results are published, whichever is later. 
  3. Following 2 years of inactivity, data will be migrated to FASRC Long-Term Storage. The data will be retained for an additional 5 years to meet the University Data Retention guidelines. Following the completion of 5 years, the data can be deleted. Departments will be notified via email prior to the deletion.

Temporary and Scratch Storage:

Data stored in scratch or temporary directories may be deleted after 90 days without notice to maximize available resources. 

Deletion Procedures: 

  • Faculty and/or departments will be notified in advance of research data being deleted, per the timelines above. If PIs or Faculty are no longer associated with the University, the relevant department leadership will be notified via email. 
  • Data will be deleted using secure erasure methods in accordance with institutional IT security standards. 
  • Requests for retention extension can be made in writing and are subject to approval by FASRC and the department; individuals requesting the extension will be responsible for all associated storage costs. 

Ownership and Roles: 

  • University: Harvard University owns all research data generated through projects conducted under its authority or using its resources. While PIs and researchers manage and safeguard the data, the University is ultimately responsible for compliance with legal and sponsor requirements, ensuring confidentiality and security. 
  • Principal Investigators: Principal Investigators (PIs) are stewards of research data. If PIs choose to delegate responsibility within their research groups, the PI remains accountable to the University for stewardship of the data. Principal Investigators are responsible for ensuring proper data management, storage, and accessibility, meeting all University, legal, and sponsor requirements. This involves setting up procedures for data retention, confidentiality, and sharing while respecting data use agreements. 
  • Departments: In the case that a PI has left the University without delegating responsibility for data, the associated primary department of the departed PI takes on the role of steward. 
  • Researchers: Harvard community members who assist with management of data created, analyzed, and stored on FAS RC systems.
  • FAS RC: Responsible for executing deletions as outlined, maintaining logs of deletion actions, and responding to extension or exception requests. 

Policy Review: 

This policy will be reviewed and updated annually or as required by regulatory or operational changes. 

Last modification date: 2025-12-02

Related Policies and Information 

]]>
29272
Data Storage (Offerings, Workflow, Costs) https://docs.rc.fas.harvard.edu/kb/data-storage-workflow-rdm/ Thu, 09 Oct 2025 19:49:38 +0000 https://docs.rc.fas.harvard.edu/?post_type=epkb_post_type_1&p=29091 FAS Research Computing (FAS RC) is transitioning to a new storage infrastructure, incorporating over 70 pebibytes of new data storage. This will ensure FAS RC remains at the forefront of research, with an innovative, scalable, and reliable data storage environment that will meet the evolving needs of the Harvard community.  

The transition consolidates and modernizes a significant portion of existing storage filesystems by migrating research data to new and improved hardware. 

Benefits: 

  • Enhanced support for computationally heavy workflows including AI and Machine Learning  
  • Improved researcher experience with greater visualizations and storage tracking capabilities including data lifecycle management  
  • Streamlined and consolidated storage environments reducing the need for migrations and complex data workflows 
  • More resilient and reliable hardware decreasing the potential for security risks and vulnerabilities
  • Built-in storage backups and encryption to prevent data loss 
  • Greater technological efficiency, reducing operational costs while allowing for long-term growth and scalability

Improvements:

  • Scalable, cost-effective storage designed to support researcher demands and lifecycle trends
  • Improved service quality with resilient infrastructure, providing reliable enterprise-grade support for a better user experience
  • Reduced manual overhead on data migration efforts, reallocating staff resources to strategic initiatives
  • Provides a predictable long-term cost recovery model with transparent pricing
  • Supports future initiatives including AI/ML workflows, secure multi-protocol access, and ever evolving scientific workflows

Identification of an appropriate storage location for your research data is a critical step in the research data lifecycle, as it ensures research data remains usable. We recommend you review the available storage options and select the preferred storage offering for your group’s intended workflow, keeping in mind how often the data will be consistently utilized and accessed. The offerings below are designed to store research data, rather than administrative data.

Each FASRC account is provided with a 100GiB Home Directory for individual use. Each PI or Lab Account also receives a 4TiB Lab Directory, for use by all members of the PI’s lab group and a 50TiB allotment of scratch (networked scratch). See the matrix below for more details.

*Snapshots are copies of a directory taken at a specific moment in time. They offer labs a self-service recovery option for overwritten or deleted files within the specific time period. Disaster recovery is a copy of an entire file system that can be used internally by FASRC in case of system-wide failure.

Storage Offerings (Paid)

Compute StorageLab StorageLong-term storageTape (NESE)FASSE
DescriptionActive storage for data analysis; data readily utilized and accessed. Highly performant cluster adjacent storage. Optimized for AI/ML workflows.General purpose storage for raw and project data. Not intended for heavy computational workflows. Can be used as buffer storage for lab instruments.Long-term storage of research data to meet institutional data retention and compliance requirements. On-premise long-term storage option for Harvard affiliated labs.Long-term storage of inactive research data after project completion or data retention purposes. Externally managed by Northeast Storage Exchange (NESE).Secure storage environment for analysis or sensitive data, such as data generated using Data Use Agreements (DUAs) or IRB
PerformanceHighModerateLowNoneModerate
SizeAvailable upon requestAvailable upon requestAvailable upon request20TB increments. Ten thousand files per folder. File sizes between 1GiB to 100 GiB.Available upon request
Mount/n/compute_storage/pi_lab/n/lab_storage/pi_lab/n/long_term/pi_labTransfer data to Tape using Globus/n/fasse/pi_lab_projectname_l3
RetentionWeekly snapshots for 2 weeks. No disaster recovery.Daily snapshots weekly. Weekly snapshots every 4 weeks. Includes disaster recovery.No snapshots. Disaster recovery at additional cost.No snapshots. No disaster recovery.Daily snapshots weekly. Weekly snapshots every 4 weeks. Includes disaster recovery. Encryption at rest included.
Cost$150/yr per TiB$125/yr per TiB$30/yr per TiB$15/yr per TB$150/yr per TiB
Security LevelLevel 2Level 2Level 2 (Up to Level 3)Level 2Up to Level 3
StorageRequest storage allocationRequest storage allocationRequest storage allocationRequest storage allocationRequest storage allocation

Storage Offerings (Complimentary*)

Home DirectoryLab Directorynetscratch
DescriptionPersonal user storage. Not recommended for computational purposes.General lab storage. Install software to be referenced from netscratch.Temporary storage location for high performance data analysis.
PerformanceModerateModerateHigh
Size100GiB (fixed)4TiB (fixed)50TiB (fixed)
Mount/n/homeNN/username/n/holylabs/n/netscratch
RetentionDaily snapshots weekly. Weekly snapshots every 4 weeks. Disaster recovery.No snapshots. No disaster recovery.No snapshots. No disaster recovery.
90-day retention policy.
CostNoneNoneNone
Security LevelUp to Level 2Up to Level 2Up to Level 2
StorageFolder generated for each user when granted cluster access. Limited to 100GiB.Folder generated for each approved PI and their group. Limited to 4TiB.Accessible to group members.

*Harvard-sponsored

Data Storage Workflow

Default Directory Structure

Two subdirectories will be present by default within the parent directory to enable easier Globus transfers and provide some initial guidance for how to organize storage.

Lab: This directory is intended as the primary working directory. It is also the directory shared out via Globus. By default, folders in this subdirectory are visible to the whole lab. Individual users may update their permissions to adjust access as they like though we highly recommend keeping access open to all lab members to allow for easier collaboration and data cleanup after you leave the university.

Everyone: This directory is visible to any one on the HPC cluster and is intended for collaboration with other labs on the cluster. Data in this directory is by default owned by the lab who hosts the data. Note that this directory is not available on Globus and is intended only for internal sharing.

While this is the default structure, labs may request additional folders be set up. Please email rchelp@rc.fas.harvard.edu if you have questions.

Directory structures on the cluster may differ depending on when they were created. Some older storage folders may have a third subdirectory called Users. We have deprecated use of this folder due to issues related to data access by the lab and PI’s, especially after users have left the university. If you are migrating data from a storage system that has a Users subdirectory we recommend moving that data into the Lab directory and making it available to the lab to view and access.

Contact:

If you have questions regarding the data storage options at FASRC, please email the Research Data Manager at rdm@rc.fas.harvard.edu.

]]>
29091
Onboarding Policies and Procedures https://docs.rc.fas.harvard.edu/kb/onboarding/ Thu, 02 Jan 2025 19:01:47 +0000 https://docs.rc.fas.harvard.edu/?post_type=epkb_post_type_1&p=28088 This document outlines FAS Research Computing’s policies and procedures related to the onboarding of researchers and PIs. The document is structured as a checklist, to be utilized by researchers and PIs as they enter the university or join a new lab. The document also notates differences between the onboarding of researchers and faculty (PIs).  

Onboarding Checklist: Faculty

 

Onboarding Checklist: Researchers

]]>
28088
Offboarding Policies and Procedures https://docs.rc.fas.harvard.edu/kb/offboarding/ Wed, 25 Sep 2024 17:04:39 +0000 https://docs.rc.fas.harvard.edu/?post_type=epkb_post_type_1&p=27690 This document outlines FAS Research Computing’s policies and procedures related to the offboarding of researchers and PIs. The document is structured as a checklist, to be utilized by researchers and PIs prior to their departure, to ensure a seamless transition. The document also notates differences between the offboarding of researchers and faculty (PIs).  

Offboarding Checklist: Leaving Harvard University

Researchers:

  1. General: 

    1. Inform FASRC via email prior to leaving the university, and provide us with an estimated departure date. 
  2. Storage: 

    1. Please review all research data prior to your departure (FAS Storage, Google Drive, Dropbox etc.). Confirm with your PI and department what data can be deleted or moved to long-term storage. 
      1. Review and receive approval from your PI what data can be removed. 
        1. Delete any data approved by your PI. 
        2. Please ensure a record of what data was deleted is available to your PI, if needed.
        3. For protected data (Level 3), PIs are responsible for informing FAS RC if and when the data requires disposal. Please email FAS RC to discuss destruction options.
      2. If research data stored on FASRC storage is ready to be moved to long-term storage, work with FASRC’s Research Data Manager and your PI to migrate the data. 
        1. An FASRC account is required to access FASRC storage; please ensure you have an account prior to moving data via rclone or Globus
      3. Ensure your research data is available to your PI and other collaborators, moving all research data to a shared storage location prior to your departure. Please ensure a record of what data was migrated is available to your PI, if needed.
      4. If you would like to take data with you following your departure from the university, you will need approval from your PI and department. Research data generated at the university is owned and maintained by the university. 
  3. Accounts 

    1. We will be closing your FASRC account when your appointment ends and your Harvard email account is closed. 
    2. If you need to maintain a FASRC account, please have your PI or authorized lab member (general manager or access manager) email us directly, prior to your departure, so we can convert the account to an external account and extend it by 90 days. 
      1. If you require an extension longer than 90 days, your PI will need to email us again prior to the end of the 90 day extension. We will also need an external email address for the account, as your Harvard email will be disabled automatically. 
    3. Disabling the account will automatically remove you from associated groups, including secure groups (FASSE), administrative groups, and project groups. 

Faculty/PIs:

  1. General: 

    1. Inform FASRC via email when you will be leaving the university. 
    2. Please inform FASRC if you will be returning or compensating FASRC for any physical resources (compute notes and storage servers).  
    3. Please ensure you review the FAS Employee Exit Checklist; the document highlights other offboarding responsibilities for faculty leaving Harvard.
  2. Software: 

    1. All purchased software will remain on the cluster. Please delegate the software license responsibility to another entity (lab or department) or inform FASRC when the license will expire. 
  3. Storage: 

    1. Please review all research data prior to your departure. Confirm what data can be deleted or moved to long-term storage.
      1. Please review Harvard’s Data Retention FAQs, to ensure you are in compliance with the university’s policy around data retention.
      2. Collaborate with FASRC’s Research Data Manager to migrate remaining data to long-term storage. 
    2. If you would like to take research data with you following your departure from the university, ownership of the original data may be transferred from Harvard to your new institution upon request. The University asserts ownership over research data for all projects conducted at the University, under the auspices of the University, or with University resources.
      1. Requirements:
        1. Prior written approval from the Vice Provost for Research;
        2. A written agreement from your new institution that guarantees its acceptance of ongoing custodial responsibilities for the data and allowing Harvard access to the original data, should such access become necessary for any reason;
        3. Relevant confidentiality restrictions, where appropriate.
  4. Accounts 

    1. Inform FASRC via email when you will be leaving the university so they can disable your account. Your FASRC account will be closed when your appointment ends and your Harvard email account is closed. If you attain a different appointment at Harvard after your primary appointment ends, please notify FASRC as soon as possible.
    2. All lab members will need a new sponsor for their accounts. Please inform FASRC who the new sponsor will be for any remaining lab members. 
    3. Disabling your account will automatically remove you from associated groups, including secure groups (FASSE),  administrative groups, and project groups. 
  5. Virtual Machines 

    1. Remove any data you would like to retain from virtual machines prior to your departure; please inform FASRC once the data has been removed 
    2. Virtual Machines will be decommissioned shortly after your departure, once it is no longer aligned with an active account.

Offboarding Checklist: Changing Labs/Groups

Researchers:

  1. Request to be added to the new group using Portal. Your PI can also utilize Coldfront to add users to their group. 
  2. Review your research data to determine what data will need to remain in your previous lab folder(s) and what data needs to be migrated to your new lab folder
    1. Discuss the data migration with your former PI and get approval for the move.
    2. If you plan to continue to store research data in your previous lab folder, confirm this with your former PI, as there will be associated storage costs. 
    3. Delete any research data that will not be useful to either lab. Confirm with your former PI what data can be removed.
    4. Ensure your research data is available to your former PI and other collaborators, moving your research data to a shared storage location prior to your departure. Please ensure a record of what data was migrated is available, if needed.
    5. Review data in your group’s Scratch environment, as the data will be removed.
  3. Your new PI must inform FASRC via email that they will be sponsoring your account, so they can be assigned as your primary group. Provide the date of transition. 
  4. FASRC will then modify your FASRC account information.
    1. Add you to the new lab group/department
    2. Add your new PI as your manager
    3. Modify your Slurm group to be associated with the new lab
    4. Remove you from your previous lab and Slurm group. 
      1. If you require access to your previous lab, your former PI can re-add you to their group using the Coldfront application. 
  5. Storage
    1. Home directory data will always remain with the user account. The data will not need to be transferred. 

Additional information:

  1. Harvard Human Resources Offboarding Information 
  2. Harvard IT Offboarding Information 

Contact:

If you have questions regarding the offboarding process, please email the FAS Research Data Manager at rdm@rc.fas.harvard.edu.

]]>
27690
Data Sharing Resources https://docs.rc.fas.harvard.edu/kb/sharing-for-publications/ Mon, 13 Feb 2023 21:15:11 +0000 https://docs.rc.fas.harvard.edu/?post_type=epkb_post_type_1&p=26022 Following the completion of a research project, researchers are encouraged, and often required, to share research data alongside their published works. Data sharing can benefit the entire scientific community, as it incentivizes reproducibility and replication while also providing long-term accessibility and discovery of your research. What data you share will vary based on the project and research area, but good data management practices and infrastructure can help the process.

Benefits of Data Sharing

  • Compliance with research funding organizations that require data management plans and data accessibility
  • Compliance with journals that require submission of research data alongside publications
  • Recognition for your contribution to the creation of research data, with academic citations
  • Encourages additional research opportunities, as others can replicate or reuse the data for new analyses

Data Sharing Recommendations

  • Deposit research data and code in open access data repositories
  • Use persistent links between publications and data repositories
  • Thoroughly document contextual information related to data, code, workflows and the computational environment and include the information with your dataset submission
  • Encourage reproducibility of your research results by publishing in an open access journal

Policies

Journals

Many journals now require that published articles include the associated research data as part of the submission. To determine whether your journal requires your data be shared, review the journal’s data sharing policy.

Intellectual Property

Review any intellectual property and copyright restrictions. In general, raw data cannot be copyrighted, but some expressions of data like databases can be copyrighted or licensed.

  • Promote data sharing and reuse by assigning an appropriate license
  • If you assign a license that restricts the sharing of data, provide detailed information about how to generate and analyze the data, so others can reproduce your results.
  • If you have questions about how to protect your work as an inventor, consider filing a Report of Innovation(ROI) with Harvard OTD.

Data Use Agreements

Data Use Agreements (DUAs) are binding contracts between organizations that govern how research data will be transferred and utilized. The terms and conditions vary depending on the laws and regulations around the data type (i.e. personally identifiable information), as well as the policies or requirements of the Provider. The Office for Sponsored Programs (OSP) and Office of Research Administration (ORA) are the authorized signatories for Harvard and can help facilitate the DUA process for Harvard researchers. If you have any questions, please contact the Harvard DUA team.

Data Repositories

Data repositories offer the necessary infrastructure to host research data as required by institutions, funders, and journals. They can assist with the maintenance, organization, access, and curation of your research data. They also provide a persistent identifier and citation for your data, generally in the form of a DOI.

  • Harvard Dataverse
    • All file formats accepted
    • Files cannot exceed 2.5GB, but larger files can be uploaded by Harvard Dataverse upon request.
    • Dataset size set to 1TB per researcher, but larger datasets can be uploaded by Harvard Dataverse upon request.
    • Strongly encourages use of the Creative Commons Zero (CC0) license for all public datasets
    • Assigns a DOI for each dataset
    • Tiered level access (administrator, collaborator, curator)
    • Comprehensive data and metadata search capabilities
    • Data downloading via API
    • Free for all researchers worldwide (up to 1TB), with no maintenance fees

Manuscript Repositories

Other Data Sharing Information and Resources at Harvard

]]>
26022
Starfish Zones Data Visualization Tool https://docs.rc.fas.harvard.edu/kb/starfish-data-management/ Wed, 10 Nov 2021 14:12:36 +0000 https://docs.rc.fas.harvard.edu/?post_type=epkb_post_type_1&p=24515 Overview

Starfish Zones is a self-service visual interface that allows groups to view folder storage amounts and locations. Users can navigate through the folder structures in the dashboard to explore directory and file level details, including storage amounts, last accessed and modified times, file owners, and file counts. The tool is still under development and may experience short downtimes to accommodate modifications.

Login

To access the dashboard, navigate to https://starfish.rc.fas.harvard.edu/. You will need to be on Harvard VPN, FASRC VPN, or a wired on-campus connection. See VPN Setup for how to connect to FASRC VPN.

After navigating to the website, input your FASRC account name and password. If you have issues with your FASRC password, please visit the FASRC website.

 

If you are unable to access the Starfish dashboard, please email FASRC Help. If you are a new faculty or group owner, it may take some time before your information is fully populated in FASRC systems. You will also need an active FASRC supported storage folder to receive a Starfish Zone.

 

Navigation

Once logged in, you will be able to view storage folders associated with your group. Note: If you notice a storage folder is missing from your dashboard, please email FASRC’s Research Data Manager. While a majority of storage folders supported by FASRC are now viewable in Starfish, some filesystems still need to be added.

 

By double-clicking on the selected folder path, you can drill down into the folder down to the file level. Group members can modify what information is displayed on the dashboard by right-clicking on the column headers. All available column selections will be shown.

 

The dashboard is updated on a consistent basis. You can view when the Zone was last updated on the upper right-hand corner of the dashboard, where a date and time will be listed. If no modifications have been made to the folder contents, the updated time will reflect the last time changes were made to the folder.

Data Cleanup

Starfish can be used to identify files and directories that might no longer be useful to a group. There are several ways to use Starfish to identify folders and files that have not recently been modified or accessed or are owned by a group member who is no longer part of the team.

Browsing

  1. After logging in, navigate to the group’s storage folder.
  2. Navigate to the top of the dashboard and select the “Sunburst” map or the radial chart symbol to generate a visualization of directory contents.
  3. The Sunburst map helps visualize data based on the last accessed and modified times of the files. The map can also showcase files based on file type.
  4. You can mouse over the map for more information about the visualized directories.

Please note that directory access times are often updated when a user or process lists a directory (ls or dir), which means that access time may be later than the last time a file was actually accessed.

At the bottom of the sunburst graph is an option to select “Users” – select this option in order to see a breakdown of allocation data by user.

Export

The dashboard allows users to export the information as a CSV file. At the top of the zone is a “Download CSV” option. Users can select which columns they would like included in the downloaded spreadsheet. Some of our suggested columns include:

  • Count (number of files)
  • Path (folder path)
  • Logical size (dataset size)
  • Newest accessed (tree)
  • Newest modified (tree)

Depending on the number of files and folders listed, the CSV file may be too large to download. We recommend selecting the specific subfolder that you would like to view as an export, to decrease the size of the downloaded file.

Tarring Data for Tape with Starfish Tags

Starfish can also be used to tar the contents of a folder in preparation for migration to NESE Tape storage. The tarring script is designed to tar files “in place,” tarring files in the source folder to prevent issues with storage capacity.

To have this tagging option enabled for your lab, please email rdm@rc.fas.harvard.edu, which will initiate a ticket. We will then enable the ‘tag’ for your Starfish dashboard.

Once the tag has been enabled, you can follow the steps below to initiate the process on your own. Keep in mind that Starfish’s script creates tar files that are compliant with NESE Tape requirements for file sizes between 1GB-1TB and the number of files that can be stored per allocation, which is roughly 100 MB/file.

Applying the Tag

  • Navigate to the folder in the Starfish interface that you want to tar. The tag only needs to be
    applied once for a given directory tree; everything under the tag will be tarred so ensure you are selecting the correct level.
  • Right click on the folder and select “Classification Tags” and apply the tag
    “fasrc_tagset:tar-in-place”
  • The automated script will run hourly once the tag is applied (on the half hour).
  • The process will show as completed when ‘manifest files’ with the following naming convention appear in the parent folder, where the tag was applied: “manifest_1234_tar_wrapper_YYYYMMDD_csv.gz”. The information and date in the naming convention will change depending on when the tag was applied and the folder contents.

Tag Workflow

  • The script is designed to prepare files for migration to Tape storage, with those specific size constraints in mind. The files will be compressed to approximately 1GB in size.
  • The script will continue to tar any uncompressed files in the folder. If new files or folders are added to a tagged folder, the script will begin again to compress the files. The script is configured to run hourly, selecting which step to start on depending on the current state of the tagged folder.
  • Once all files are compressed and the tarring jobs have completed, the original ownership permissions will be reapplied and the manifest files (listed above) will be generated. This may take a day or two to be applied, as it relies on the latest Starfish scan. We recommend keeping the tag on the directory for a day or two to ensure the permissions are applied.
  • When the files are tarred, removed, manifest file created, and permissions applied, you can remove the tag. The script will continue to run on an hourly basis until the tag is removed.

Contact

If you have any additional questions about how to login or utilize the Starfish Zones dashboard, please email Sarah Marchese, FASRC Research Data Manager.

]]>
24515
ColdFront User Guide https://docs.rc.fas.harvard.edu/kb/coldfront-allocation-management/ Wed, 10 Nov 2021 14:04:28 +0000 https://docs.rc.fas.harvard.edu/?post_type=epkb_post_type_1&p=24511 ColdFront is an open-source resource allocation management system designed to provide a central portal for administration, usage reporting, and management of HPC resource allocations.  FASRC adapted the open-source software to manage allocations on the FASRC cluster.  The platform enables the viewing and management of both lab groups (Projects) and their storage or cluster allocations (Allocations).

Accessing Coldfront

To access Coldfront, connect to the @fasrc VPN and log in using FASRC credentials.

https://coldfront.rc.fas.harvard.edu/

 

After logging in you will see your home page, which has sections for your projects, your allocations, and any pending requests or change requests for your allocations. Click the project link or Allocation status button to view details about a given project or allocation.

 

Project Pages:

A Project page allows all project members to:

  • view the project’s allocations
  • view the project’s users
  • adjust their project notification settings.

Additionally, from the project page, project managers and PIs can:

  • request new storage allocations
  • request changes to existing storage allocations
  • add users to the project
  • remove or request to remove users from the project
  • edit the roles of project users (i.e., assign or remove Manager status)
  • send an email to project users that have elected to receive notifications.

The project page’s allocation section allows you to view storage and cluster allocations for your lab. Managers can also click the “Request New Storage Allocation” button on this table’s header to… yes, that’s right, submit a request for a new storage allocation.

 

Project Membership and Roles

The project page contains a users table that lists all the users in the lab group. PIs, Access Managers, and and General Managers will also see options to add users to the project, and remove or request to remove users from the project. PIs and General Managers can additionally edit the roles of project users or send an email to project users that have elected to receive notifications.

Project user roles correspond to the roles described in FASRC Roles and Responsibilities:

  • PI. This role is automatically assigned to the lab’s PI and cannot be assigned to another Project user. The PI can:
    • Request new allocations
    • Request changes to the size of existing allocations
    • Add and remove project users
    • Change the roles of existing project users
  • General Manager. General Managers can be assigned by PIs. They have all the permissions of a PI, save the ability to assign General Manager status to other project members.
  • Storage Manager. Storage Managers can be assigned by PIs and General Managers. Storage Managers can:
    • Request new allocations
    • Request changes to the size of existing allocations
  • Access Manager. Access Managers can be assigned by PIs and General Managers. Access Managers can:
    • Add and remove project users
  • User. Project users can view project and allocation information. They cannot change or request changes to the project or its allocations.

 

Allocations:

The Allocation Page provides a comprehensive view of details about the allocation, presenting key information such as the total allocation size, overall usage, and the estimated monthly cost. This page also features a table that illustrates usage per user, with data sourced and updated daily from our data management system, Starfish.

 

Making an Allocation Request:

PIs and users with General Manager or Storage Manager status can make a new allocation request or request changes to project allocations.

To request a new allocation:

  1. Go to the page of the project the new allocation is for and click the Request New Storage Allocation button.
  2. Fill out and submit the allocation request form.
  3. You will be notified via email when your new allocation is ready to use.

Allocations can be requested on storage tiers 0-3. To explore and understand the specifics of each storage tier, please refer to our detailed documentation on storage tiers here.

Making an Allocation Change Request:

PIs and project managers can request to change the size of an allocation associated with their project. Follow these steps to initiate the process:

  1. Navigate to the allocation page corresponding to the allocation you wish to modify.
  2. Click the “Request Change” button at the top of the “Allocation Information” table.
  3. In the resulting form, shown below, enter the desired size of the allocation and the justification for those changes and click submit.
  4. You will be notified via email when the allocation is updated and ready to use. Space permitting, we try to fulfil change requests within 3 business days.

 

For storage features and updates, please review the Data Storage Workflow documentation on our website.

For information about ColdFront’s dataflow, check the Storage Service Center page.

If you have any questions, please feel free to reach us here: rchelp@rc.fas.harvard.edu

]]>
24511
RSpace Electronic Lab Notebook (ELN) https://docs.rc.fas.harvard.edu/kb/rspace/ Thu, 03 Jun 2021 19:45:00 +0000 https://docs.rc.fas.harvard.edu/?post_type=epkb_post_type_1&p=23996 Overview

RSpace (ResearchSpace) is an open-source electronic lab notebook (ELN) supported by Harvard Library and University Research Computing (URC) that can help researchers organize, store, and share protocols, analysis and experimental notes in a centralized and secure platform. RSpace has been developed in close consultation with researchers and research data professionals at leading global research institutions to meet expanding researcher needs.

Benefits

Tailored for academic environments and the research workflow, ELNs help manage and track research data throughout the data lifecycle. They facilitate good data management practices, support data review and oversight, enable group collaboration, and manage inventories of samples and other equipment.

Features

Researchers:

  • Capture and organize research data to maintain a record of work
  • Collaborate with lab groups and lab members
  • Simplify data inventory and sample management
  • Integrate with popular research tools
    • Harvard Dataverse, protocols.io, OneDrive, Google Drive, Dropbox, S3 Cloud, DMPTool, GitHub, Jupyter Notebook, Slack and more!
  • Link to research data stored in institutional storage
  • Store research data up to Harvard Security Level 3
  • Login with HarvardKey authentication

Principal Investigators (PIs):

  • Delegate administration of group access to a manager
  • Open or restricted data sharing capabilities available
  • Real time visibility of researcher communication
  • Data ownership stays with the PI and the university
  • Export data in HTML, XML, DOC and PDF formats
  • Data backups retained by RSpace to prevent data loss

Additional information available on the RSpace website. An overview of the RSpace ELN tool is also provided by Harvard Library.

Eligibility

Contact

To request a consultation or training, contact Harvard Library’s Open Scholarship and Research Data Services team at hl_osrds@harvard.edu

]]>
23996
Grant Support https://docs.rc.fas.harvard.edu/kb/grant-support/ Mon, 21 Oct 2013 13:21:59 +0000 https://rcwebsite2.rc.fas.harvard.edu/?page_id=8444 Research Computing can provide Harvard faculty with a personalized letter of support for grant submissions. The letter will describe the Research Computing environment and detail the level of support and expertise that the Research Computing team will be able to provide. Because each grant submission is unique, it is best to contact FASRC to discuss grant applications and letters of support.

]]>
8444