Data Security – FASRC DOCS https://docs.rc.fas.harvard.edu Tue, 03 Dec 2024 16:16:44 +0000 en-US hourly 1 https://wordpress.org/?v=6.7.1 https://docs.rc.fas.harvard.edu/wp-content/uploads/2018/08/fasrc_64x64.png Data Security – FASRC DOCS https://docs.rc.fas.harvard.edu 32 32 172380571 FASRC Data Ownership and Access Policy https://docs.rc.fas.harvard.edu/kb/data-ownership-and-access-policy/ Wed, 17 Jul 2024 13:33:15 +0000 https://docs.rc.fas.harvard.edu/?post_type=epkb_post_type_1&p=27362 Data stored in FASRC lab directories (/n/holylabs/LABS/<labname>_lab) and other lab shares on FASRC storage are owned and managed by PIs or group owners. If written approval is provided to FASRC by the PI or group owner, FASRC can modify the data permissions to allow ownership and access as the PI or group owners require.

  • Additional approval by the original owner of the data will not be required. The PI or group owner of the storage folder is responsible for its use.
  • Self-service mechanisms are already in place permitting labs and groups to modify folder permissions. This policy expands upon available tools, affirming FASRC’s right to alter folder permissions, when requested.

Additional notes

  • Written approval will be required from the PI or group owner for FASRC to modify access without the original owner’s consent.
    • Explicit written permission needs to derive from the PI or group owner, or an individual approved by the PI to assume similar responsibilities (i.e. a Data Manager).
  • Lab folders on FASRC storage will initially be created as group writable, which allows for easier collaboration, data migrations across platforms, and data cleanup.
    • A lab or group may choose to modify folder permissions, but the default setup will be group writable.
  • If a lab folder is not group writable, FASRC can modify permissions to make a folder group writable, as requested by the PI or group owner or an individual approved by the PI or group owner to assume similar responsibilities (i.e. a Data Manager).

Associated University Policies

]]>
27362
Access to FASRC services from Abroad https://docs.rc.fas.harvard.edu/kb/access-from-abroad/ Mon, 01 Apr 2024 22:44:36 +0000 https://docs.rc.fas.harvard.edu/?post_type=epkb_post_type_1&p=26914 Access from Outside the USA

Policies: You should be aware of and peruse university guidance and policies related to accessing Harvard resources or doing Harvard business while outside the United States. This is not an exhaustive list. Contact your school security officer for additional guidance or concerns.

Security: If you are working with protected, confidential, or other data covered by use agreements or licensing, you should consult your local security officer to ensure you are allowed to do so in the country you will be traveling to.

Review the HUIT Prepare to Travel to a Cyber High-Risk Country guidelines (or your school’s equivalent)

VPN:  The FASRC login nodes (login.rc.fas.harvard.edu) do not require you to be on the FASRC VPN in countries where there are no restrictive boundary firewalls, but many other resources that are available on the campus network do require VPN when off-campus. Countries with restrictive boundary firewalls may still block your access to the FASRC VPN. Please set up and test your ability to connect to the FASRC and/or Harvard VPN beforehand.

Speed: There is no way for FASRC to guarantee any expectations of connection speed. Beyond the North American mainland there may be many links and their bandwidth or quality may cause latency or slowness.

Access from China & Restrictive Countries

In addition to the above, please be aware that FASRC VPN and other FASRC services may be inaccessible to users inside China and other countries with restrictive boundary firewalls.

Unfortunately, due to the ever-changing nature of China’s boundary firewall, there is no way to predict what egress is currently disallowed and their rules are especially aggressive towards blocking Western VPNs.

If access to our VPN (or to any other services) is disallowed, this is beyond our control or ability to work around. Please do not assume you will have normal access from China or other countries with restrictive boundary firewalls. Also, please do expect the connection to be slow if it does work.

It is worth noting that SSH access to our login nodes (login.rc.fas.harvard.edu) does not require you to be on our VPN, but there is still no guarantee that these will be accessible from inside China’s network.

Please review the HUIT Prepare to Travel to a Cyber High-Risk Country guidelines (or your school’s equivalent). Contact your school security officer for additional guidance or concerns.

]]>
26914
Data Security Levels https://docs.rc.fas.harvard.edu/kb/data-security-levels/ Tue, 22 Jun 2021 18:10:09 +0000 https://docs.rc.fas.harvard.edu/?post_type=epkb_post_type_1&p=24074 What is a Data Security Level (DSL)?

Harvard groups data into 5 data security levels depending on the sensitivity of the data.  The DSL for data determines how that data must be managed.

DISCLAIMER: The information on this page relates only to the FASRC clusters and our current understanding of Harvard policy. Please refer to the Harvard Security Data Security Levels page for up-to-date university policies and information.

 

Cluster Data Security Level Ratings

Public
Public information (Level 1/DSL1): The FASRC Cannon cluster is rated only for DSL 1 and DSL 2 data.
Low Risk
Low Risk (Level 2/DSL2): The FASRC Cannon cluster is rated only for DSL 1 and DSL 2 data.
Medium Risk
Medium Risk (Level 3/DSL3): Only the FASRC FASSE (FAS Secure Environment) cluster is rated for DSL3 data.
High Risk
High Risk (Level 4/DSL4): For DSL4 projects, please contact University RC (URC) for options.
Extreme Risk
Extreme Risk (Level 5/DSL5): FASRC has no systems rated for DSL5 data.

 

LINKS

]]>
24074
Web Scraping Policy https://docs.rc.fas.harvard.edu/kb/web-scraping-policy/ Wed, 22 Apr 2020 11:59:46 +0000 https://docs.rc.fas.harvard.edu/?post_type=epkb_post_type_1&p=23311 Web scraping is a contentious issue within research. While it is true that fair use provides for many uses of data gleaned from the Internet, in general this is applied to human information gathering, not programmatic machine scraping. That distinction makes the act of brute-force scraping an issue separate from fair use.

You, as a representative of Harvard, are not just using the source’s data, but also their servers, bandwidth, etc. in a way the source may not approve. This can lead to IP blacklisting and even legal action. So please tread carefully as your actions could negatively affect others.

If in doubt or in need of more authoritative guidance, please contact the Harvard Office of the General Counsel or Office of the Vice Provost for Research

Please be aware that merely being an academic does not exempt you from the usage policies of social media and other Internet platforms like Facebook, Twitter, etc.

Sensitive Data

If the data you are acquiring is considered sensitive, confidential, or contains human data, you will need to have this data reviewed for compliance before placing it on the FASRC cluster. If in doubt, you should always err on the side of caution and contact the Office of the Vice Provost for Research

 

Scraping data for use on the FASRC Cluster

If your research requires you to scrape content from the web, please review the following guidelines and suggestions.

We highly discourage using the cluster itself to scrape data Due to its size and ease of parallelization of processes, the cluster is easily weaponized and your actions could have consequences for other researchers. Please seek another avenue for data acquisition first.

You should contact FASRC before commencing any scraping activity using the FASRC cluster.

It is highly preferable that you do the scraping elsewhere and then bring the data to the FASRC cluster for processing. If the data is sensitive, confidential, contains human data, or it is unclear, then this is a requirement. See ‘Sensitive Data’ above.

Source Permission

If you are in doubt or have questions, please contact the Harvard Office of the Vice Provost for Research

Data on the Internet should not be programmatically (or ‘brute-force’) scraped using FASRC computing resources, even for academic research purposes, unless FASRC has given permission to proceed using the cluster or some system tied to the cluster, and:

A) The source provides an API for this purpose and any requirements they impose have been met.

B) The source allows/does not prohibit scraping in their terms of service or other public notice.

C) The source is the United States government and the data in question was generated with public funds and is publicly available without encumbrance. Further, that the site not be scraped using brute-force means if an API is provided.

D) The source has given you explicit permission in writing or via a secondary document spelling out that permission.

Data cannot be programmatically scraped using FASRC computing resources if the source has explicitly forbidden scraping in their terms of service and written permission to do so cannot be obtained. In such a case, you should investigate other options for acquiring this or similar data.

Throttling and Blacklisting

Scraping content from websites using highly parallelized processes, even with unfettered permission from the source, should be avoided. Doing so runs the risk of having the cluster, or even the university’s, IP range blacklisted. This could have an undesirable effect on other network and cluster users. Please ensure your processes pull data at a reasonable rate unless you explicitly have written approval from the data source to download more aggressively and assurance that this will not lead to blacklisting from them or their upstream provider.

Related:

Harvard Office of the Vice Provost for Research

US Data.gov Data Harvesting Information

Archive.org Scraping

 

]]>
23311
Nielsen Dataset https://docs.rc.fas.harvard.edu/kb/nielsen-dataset/ Wed, 16 Oct 2019 14:47:12 +0000 https://www.rc.fas.harvard.edu/?page_id=22011 The Harvard Library provided funds for a paid subscription to the Nielsen DataSet from University of Chicago Booth School of Business and the Harvard Office of Sponsored Programs is the point of contact to receive and manage registration codes and provide a DUA for Researchers.  Before obtaining access a Researcher must:

  1. Determine which Harvard IT representative you will work with to obtain a certificate of destruction of data after the expiration date of the data. This could be your own department IT person if you are using their servers or your Harvard issued laptop or FASRC if you are using our servers.
  2. Register the project with on the Chicago Booth Nielsen site.
  3. Submit data safety and DUA applications in the Research Administration system
  4. If using FASRC resources, submit a FASSE project request form including the DUA and DAT IDs from the Research Administration system.
  5. Transfer of Data is via Globus.  To transfer to your own Harvard issued laptop or workstation you will need to setup a personal endpoint.  If using FASRC resources, you will need to work with FASRC to setup a location for transfer.

The following provides notes about the two documents that must be obtained by the researcher before using the data set.

NOTE: This summary does not in anyway substitute for the knowledge the individual researcher must have by reading the entire documents.

Data Access and Confidentiality Agreement

At the beginning of this document, the Researcher and school/department official responsible for oversight of research/data access must be designated and adhere to the following.

  1. Understand that “Data” is more than just the downloaded data set.
  2. The Researcher shall submit a copy of his or her Research Project description, along with a signed copy of this Agreement and any applicable IRB Approval(s) to the Harvard Office for Sponsored Programs (dua@harvard.edu).
  3. The Researcher may disclose Data only to those PhD seeking students of Harvard working under the supervision or direction of Researcher on the Research Project(s).
  4. Pay attention to the limited use of the Data.
  5. The Researcher agrees to use appropriate safeguards of the Data.  This includes proper security of the data from the Internet or transmission thereof.
  6. The Researcher agrees to submit to Chicago Booth copies of all final papers or other publications arising from use of Data at least thirty (30) days prior to their proposed publication or other public dissemination.
  7. Any personally identified information within the Data shall not be disclosed in any manner.
  8. The Researcher shall coordinate with their school IT staff to provide any required certifications of destruction to Chicago Booth.  In the case of the data stored with FASRC we are happy to provide that.
  9. Researchers must abide by all federal and state laws and Harvard policies on research conduct.
  10. The Agreements last the entire duration of the Research project.

Master Access Agreement

  1. Data and License is made solely to the individual researcher via Nielsen site. This research can be Ph.D. student or postdoc that has a tenure-track faculty registered on the site.  Data is not available to undergraduates or masters students.
  2. The Fee has been paid for by Harvard Library
  3. Support is not provided by Nielsen.
  4. The start date is set on this document and the end date of use is 1 year after the start date
  5. An annual status report must be provided back to Chicago Booth
  6. Nielsen owns the data and there are particular rules around publishing that must be met.
  7. Data cannot be distributed and must remain on Harvard issued equipment, that is either servers maintained by Harvard or encrypted laptops or workstation issued by Harvard.
]]>
22011
Data and Data Use Agreements (DUA) https://docs.rc.fas.harvard.edu/kb/data-use-agreements/ Wed, 29 May 2019 15:24:19 +0000 https://www.rc.fas.harvard.edu/?page_id=21246 Preface

Before any data which is considered confidential, proprietary, or otherwise considered sensitive can be stored on the FASRC cluster, it must be properly classified and any data use agreements must be in place and available.

The project PI is responsible for ensuring that any future approved access is compliant with any DUA or data use other agreement, including updating the data provider before approving access, if required.

Human or Animal Data

If you are collecting or using data from humans or animals, you should contact Harvard’s Institutional Review Board (IRB) and/or Institutional Animal Care and Use Committee (IACUC) first.

Any data of this type which does not have an IRB determination cannot be transferred to the FASRC cluster until that process is complete. 

  • LEVEL 3/DSL3: Please note that only the FASRC FASSE Secure environment is rated for Level 3/DSL3. The main cluster is rated only for Level 2 or below.
  • LEVEL 4/DSL4: If you require a Level 4 environment, you can contact FASRC to discuss your project, but please be aware that FASRC does not currently provide a Level 4 secure environment. The FASRC cluster, including FASSE, is not suitable for DSL4 projects.
Where to start:

See also:

Data Use Agreements (DUA)

Many data sets require a Data Use Agreement which must be on file at Harvard and adhere to the requirements and duration of that agreement. This should be completed prior to transferring any such data to the FASRC cluster.

“The transfer of data between organizations is common in the research community. When the data is confidential, proprietary, or otherwise considered sensitive, the organization providing the data (“Provider”) will often require that the organization receiving the data (“Recipient”) enter into a written contract to outline the terms and conditions of the data transfer. Such a contract is usually referred to as a Data Use Agreement (DUA), although it may also be referred to as a License Agreement, Confidentiality Agreement, Non-Disclosure Agreement, Memorandum of Understanding, Memorandum of Agreement, or other names if these agreements include data sharing or data transfer requirements.” – Source

To submit, manage, and review DUA requests, you will use Harvard’s DUA Agreements System

Where to start:
HRDSPHarvard Research Data Security Policy site
HRDSPApplications Summary and Order of Reviews

]]>
21246
Confidential Data https://docs.rc.fas.harvard.edu/kb/data-security-information-on-storage-and-use-of-confidential-data-hrci/ Thu, 04 Feb 2010 16:20:08 +0000 http://rc-dev.rc.fas.harvard.edu/data-security-information-on-storage-and-use-of-confidential-data-hrci/ We would like to bring your attention to how Harvard University classifies different types of confidential data and how they should be stored. Confidential data is defined as “Information about a person or an entity that, if disclosed, could reasonably be expected to place the person or the entity at risk of criminal or civil liability, or to be damaging to financial standing, employability, reputation or other interests.” Harvard University’s Technology Security Office pages also suggest how such data should be handled and stored. In particular it states, “All confidential information must be encrypted when transported across any network.”

Please note that *none* of the general purpose storage offered by Research Computing (RC), unless expressly custom designed and built for a particular purpose, satisfies this criterion. Under no circumstances should any confidential data be stored in RC storage unless there we have made arrangements with your lab to provide appropriate space and have an IRB determination for your data. If you have any confidential data that you need stored, please email rchelp@rc.fas.harvard.edu to schedule an appointment. We will be happy to discuss your particular needs and design a storage solution that is compliant to Federal, State and University regulations.

Data safety is something that should be all of our concern. Faculty, staff and students at Harvard are routinely responsible for data that is governed by various regulations. We ask you to regularly audit the nature of the data you are using and are responsible for and ensure that you are taking the right steps to protect yourself and others from harm. If you have any questions about interpreting the University, State or Federal regulations and how that applies to your data please contact us at rchelp@rc.fas.harvard.edu.

See also:

]]>
5390