Data Transfer – FASRC DOCS https://docs.rc.fas.harvard.edu Fri, 08 Nov 2024 15:55:59 +0000 en-US hourly 1 https://wordpress.org/?v=6.7.1 https://docs.rc.fas.harvard.edu/wp-content/uploads/2018/08/fasrc_64x64.png Data Transfer – FASRC DOCS https://docs.rc.fas.harvard.edu 32 32 172380571 Globus: Transfer Data to Tape https://docs.rc.fas.harvard.edu/kb/tape-globus-access/ Wed, 16 Mar 2022 19:40:12 +0000 https://docs.rc.fas.harvard.edu/?post_type=epkb_post_type_1&p=24793 This document is derived from Log In and Transfer Files with Globus to help FASRC users to use Globus to transfer data from different storage tiers. Globus is one of the access mechanisms to transfer data to and from the NESE Tape System so it’s important to set up a Globus account so FASRC can provide you access to the Tape collection. Please review Storage Service Center and Tier 3 docs as this has references for tools like Coldfront and allocations.

 

Log in with an existing identity

Visit www.globus.org and click “Login” at the top of the page. On the Globus login page, choose an organization you’re already registered with, such as Harvard University (Try typing a few letters of Harvard to narrow the list.) When you find it, click “Continue.

Globus Harvard University

You’ll be redirected to the Harvard Key Login page and use your university credentials to log in.
Harvard key

Harvard Key will ask for your permission to release your account information to Globus.

Once you’ve logged in with your organization, Globus will ask if you’d like to link to an existing account. If this is your first time logging in to Globus, click “Continue.” If you’ve already used another account with Globus, you can choose “Link to an existing account.”

 

Access Tape Storage

After you’ve signed up and logged in to Globus, you will go to the File Manager page with a left-hand menu with different icons, you’ll need to click COLLECTIONS and then click the Shared With You checkbox on the top menu. In the list below, you will see the Tape storage if it’s already set for your lab. If you don’t see the share for your lab, please check with your PI. If you are the PI, please go to the Storage Service Center and request allocation using Coldfront. To update the existing allocation, go to Coldfront and update the allocation.

 

 

Share Lab Tape Collection

PI’s and Lab admin can share the collection with Lab members.  Go to the Collection > Shared with you and click on the lab tape collection. You can also search for your lab collection in the search area on the top menu.  After you have the share then click on the Permissions tab. On the right top corner, there is Add Permissions button to add users. Search for a user these. If you can’t find your lab tape collection, it may not be yet set up and you may still need to request an allocation going to Coldfront.

To add, the user needs to have an existing Globus account and if you are not able to search for that user, please share this page with them to set up access using the existing identity. After you are able to find a user, give read and write access. If it’s for a collaborator and you want to share part of the data, please follow Globus documentation. Feel free to write to rchelp@rc.fas.harvard.edu or join our office hours for further assistance.

Transferring Data from FASRC storage to Tape

File Manager in Globus is to transfer data to and from Tape and other FASRC storage. We have Globus File Transfer doc to provide you with more details about Globus and its features. Here, we have a simple example to transfer data from Tier 0 storage in Holyoke to Tape.

In the Globus File Manager, there is a Panels button on the top right and I have selected the split panel as below. We have Harvard FAS RC Holyoke collection on the left collection, where we have most of the Holyoke storage mounted (like holylfsxx, holystore01, etc,) and on right, I have the Tape storage we have for testing.

You will need to review the Globus menu as shown below to see what all you can do with the share you have. To move data to tape, you should make sure the data is either compressed or files are greater than 100MB before transferring any data. You can review the files after login to the cluster and creating compressed files for folders with small files. You can use zip, tar, gzip, or your favorite tools for compression or review this. Also, make sure you name the files appropriately so you can find them from the Globus interface. In the example below, we have created a folder with the month and year for these files. You can use the project name, user, etc to make the discovery easier.
After you select the data you want to transfer from storage to tape, the start button on the top will start the transfer. To transfer the data from the tape, the start button on the right menu can be used. Data transfer from tape can take time as we only have data for the last few hours on buffer storage and the remaining data is stored on tape as it may take some time to move the data back to buffer storage so it can be transferred to the requested storage/collection.

Note: The tape system is designed to place large amounts of data onto tape cartridges for cold storage.

  1. This is designed to be a slow I/O system, data is placed on tapes for long-term storage. Retrieving data from tape by the user should only be at low volume scales.
  2. Bulk retrieval of data from tape needs to be handled by the FASRC/NESE team. A retrieval request must be made well in advance of data needs.

Please follow NESE documentation and Globus documentation, or write to rchelp@rc.fas.harvard.edu or join our office hours for further assistance.

]]>
24793
Data Transfers with rclone https://docs.rc.fas.harvard.edu/kb/rclone/ Thu, 24 Oct 2019 15:19:58 +0000 https://www.rc.fas.harvard.edu/?page_id=22027  

Introduction

rclone is a convenient and performant command-line tool for transferring files and synchronizing directories directly between FAS RC file systems and Google Drive (or other supported cloud storage). If you are eligible, and don’t already have a Google Apps for Harvard account, see the Google Apps for Harvard Getting Started page. If you require help or support for your Harvard Google account or for Google Drive itself, please contact HUIT (ithelp@harvard.edu).

Configuring rclone

rclone must be configured before first use. Each cloud service has a specific configuration. Visit rclone documentation to find the specific cloud service that you need, click on its specific “config”, and follow the rclone config steps.

Google Shared Drives

To configure access to a Google shared drive, visit rclone google drive configuration. During the configuration, there is an option to select “Configure this as a Shared Drive (Team Drive)?”

Using rclone

rclone supports many subcommands (see the complete list of rclone subcommands). A few commonly-used subcommands (assuming a Google Drive configured as gdrive):

Listing / moving / deleting objects
rclone command analogous Unix command
rclone lsf gdrive:fasrc/subfolder ls fasrc/subdir
rclone lsf –format stp –separator ‘ ‘ gdrive:fasrc/subfolder ls -l fasrc/subdir
rclone mkdir gdrive:fasrc/subfolder mkdir fasrc/subdir
rclone move gdrive:fasrc/subfolder1/file1 gdrive:fasrc/subfolder2/ mv fasrc/subdir/file1 fasrc/subdir
rclone rmdir gdrive:fasrc/subfolder rmdir fasrc/subdir
rclone delete gdrive:fasrc/file rm fasrc/file
rclone purge gdrive:fasrc/subfolder rm -r fasrc/subdir

 

Transferring data

Small data transfers may be done on FAS RC cluster login nodes, while large data transfers should be done within an interactive job so that data transfer is done from a compute node; e.g.:

salloc -p test --mem 1G -t 6:00

Operands with the gdrive: prefix (assuming a Google Drive has been configured as gdrive) access Google Drive storage, while operands without gdrive: refer to a path on the FAS RC file system.

rclone copy gdrive:sourcepath destpath
rclone copy sourcepath gdrive:destpath

If sourcepath is a file, copy it to destpath.
If sourcepath is a directory/folder, recursively copy its contents to destpath. Contents of destpath that are not in sourcepath will be retained.

rclone sync –progress gdrive:sourcefolder destdir
rclone sync –progress sourcedir gdrive:destfolder

Replace contents of destdir/destfolder with the contents of sourcedir/sourcefolder (deleting any files not in the source).

 

Mounting Google Drive on a FAS RC compute node

Alternatively, rclone mount can make a Google Drive (subfolder) available on a FAS RC compute node as a regular file system (e.g., supporting common commands; such as cp, mv, and ls; that are used to interact with a POSIX file system), with limitations.

The directory on the FAS RC node at which the Google Drive will be made available as a file system (i.e., the mountpoint) must be on a node-local file system (such as /scratch) to avoid permissions issues when unmounting the file system. In particular, the mountpoint must not be within a file system in the /n/ directory, as these are all remote / network file systems.
The following example illustrates demonstrates this capability:

$ rclone lsf gdrive:fasrc/
cactus:2019.03.01--py27hdbcaa40_1.sif
ifxpong:1.4.7-ood.sif
jbrowse:1.16.5_2019-06-14.sif
subfolder/
$ mkdir /scratch/$USER
$ mkdir -m 700 /scratch/$USER/gdrive
$ rclone mount --daemon gdrive:fasrc /scratch/$USER/gdrive
$ ls -l /scratch/$USER/gdrive/
total 543900
-rw-r--r-- 1 fasrcuser fasrcgroup 495247360 May  1 16:27 cactus:2019.03.01--py27hdbcaa40_1.sif
-rw-r--r-- 1 fasrcuser fasrcgroup 50700288 Aug 22 16:05 ifxpong:1.4.7-ood.sif
-rw-r--r-- 1 fasrcuser fasrcgroup 11005952 Jun 14 15:16 jbrowse:1.16.5_2019-06-14.sif
drwxr-xr-x 1 fasrcuser fasrcgroup 0 Oct 24 10:21 subfolder
cactus_2019.09.03-623cfc5.sif  JBrowse-on-Cluster.tar.gz  MAKER-cluster-guide-for-review.tar.gz
$ fusermount -uz /scratch/$USER/gdrive/
[1]+  Done                    rclone mount gdrive:fasrc /scratch/$USER/gdrive

Comments:

  • The mountpoint (/scratch/$USER/gdrive) is created with appropriate permissions (via mkdir -m 700) to ensure only the owner has access.
  • The rclone mount command is executed asynchronously (“in the background”) using the & operator.
  • fusermount -uz explicitly unmounts the Google Drive (causing the rclone mount process to terminate).
    • This performs a “lazy unmount”, which requests that the OS perform the unmount when there are no processes whose current working directory is within the directory tree rooted at the mountpoint. To guard against accidentally leaving the directory mounted if a job or interactive session is prematurely terminated, the fusermount -uz command can be immediately issued after setting the working directory of the shell process that issues the rclone mount command can to the gdrive mountpoint; e.g.:
      rclone mount --daemon gdrive:fasrc /scratch/$USER/gdrive
      cd /scratch/$USER/gdrive && fusermount -uz .

      Then /scratch/$USER/gdrive will be automatically unmounted when the shell’s process has terminated or its working directory changed to a directory outside of /scratch/$USER/gdrive:

      cd ..
      [1]+ Done rclone mount gdrive:fasrc /scratch/$USER/gdrive
      

Limitations

At most 2 file transfers to Google Drive can be initiated per per second. Consider bundling many small files into a .zip or .tar(.gz) file.
Other Google drive limitations are listed in the rclone Google Drive documentation.

]]>
22027
Transferring Data on the Cluster https://docs.rc.fas.harvard.edu/kb/transferring-data-on-the-cluster/ Tue, 29 Jan 2019 10:20:55 +0000 https://www.rc.fas.harvard.edu/?page_id=19654

Watch Data movement on FAS Storage video

See also our Introduction to FASRC Cluster Storage video

There are several ways to move data around the cluster. The first consideration before deciding on what technique to use is what filesystems you are moving data from and how they are connected to the cluster. By and large for most filesystems, especially those connected to the cluster via Infiniband, using the compute nodes themselves to move data around is your best bet. Thus before doing any data transfers one should either start up an interactive session on the cluster or put together a batch script that contains the commands you want to use to move the data. The advantage of the batch script is that it allows you to fire off the move and not have to babysit the session being open. Plus you can also do multiple transfers at once, leveraging the power of the cluster. That said be sure the filesystems you are transferring from and to can handle the parallel traffic. In general Lustre filesystems can handle many parallel requests while NFS cannot.

For actually moving the data the following commands, in order of complexity, can be used:

  • cp/mv
  • rsync
  • fpsync

With rsync being the generally most useful of the commands.

cp/mv

Both cp and mv are standard Unix commands that will copy or move the data to a new location. They are easy and relatively straightforward to use. cp will make a second copy of the data, adding -R as an option will copy a folder recursively. On the other hand mv will move that data to a new location leaving only one copy of the data at the new location. mv also is the preferred tool for renaming files and folders as well as moving internally to a filesystem, as all it does is change the pointer name to the data. The downside of cp and mv is that neither gives any indication of how well it is performing, and neither can pick up from an incomplete transfer. Thus for bulk transfers cp and mv should be avoided. An example of cp and mv are below:

cp file.txt /n/netscratch/lab/.
cp -R folder /n/netscratch/lab/.
mv file.txt /n/netscratch/lab/.
mv folder /n/netscratch/lab/.

rsync

For the vast majority of transfers rsync will get the job done. We have a lengthy page on rsync here. In summary though rsync can allow you to copy entire directories as well as pick up from where you left off in the transfer if the transfer fails for some reason. In addition rsync is very handy for matching the contents of two directories. The most common rsync command for data transfer is as follows:

rsync -avx --progress folder/ /n/netscratch/lab/folder/

This will ensure that the folder is mirrored exactly over to the other filesystem. It will also make sure that the copy will not traverse symlinks to other filesystems that you do not wish to copy. Be aware though that rsync will match the time stamps between the copies, thus the transfer will look old to the scratch cleaner if you are copying to our scratch filesystems. To have rsync use the timestamp of the date you actually did the transfer add the --no-times option.

fpsync

Note: The following assume you are running fpsync from a job or interactive session with more than one core. You cannot utilize fpsync on the login nodes.

rsync is really great for single stream moves, especially when you have large files. However for very large directories, or for many files, one needs to take rsync to the next level. This is what fpsync does. fpsync is essentially a parallel rsync. It generates a list of files to transfer and then spawns a rsync to do the transfer. You can set the total number of rsyncs which helps to parallelize your transfer. fpsync needs to be used with care though as it can overwhelm nonparallel filesystems like NFS. However for transferring in between Lustre filesystems fpsync can move data very quickly. In general the fpsync command will be:

fpsync -n NUMRSYNC -o "RSYNC OPTIONS" -O "FPSYNC OPTIONS" /n/lablfs/folder/ /n/netscratch/lab/folder/

In most situations your fpsync line will look like:

fpsync -n NUMRSYNC -o "-ax" -O "-b" /n/lablfs/folder/ /n/netscratch/lab/folder/

Note that the fpsync logs are found in /tmp on the host you are doing the transfer on, so its harder to get an idea as to how far along fpsync is. As a general rule it is best not to set NUMRSYNC higher than the number of cores on a host. If you submit this via a job you should also wrap fpsync in srun to get the full usage, like so:

srun -c $SLURM_CPUS_PER_TASK fpsync -n $SLURM_CPUS_PER_TASK -o "-ax" -O "-b" /n/lablfs/folder/ /n/netscratch/lab/folder/

Where the number of CPUS you request for Slurm is the number of parallel rsyncs you want to run.

The /tmp path can be changed if needed using the -t option.

WARNING: DO NOT USE –delete as an option for fpsync

]]>
19654
Globus File Transfer https://docs.rc.fas.harvard.edu/kb/globus-file-transfer/ Tue, 25 Oct 2016 17:02:02 +0000 https://rc.fas.harvard.edu/?p=15555 Globus1 is a service enabling file sharing with external collaborators without requiring them to have accounts on FAS Research Computing systems. A collaborator has to use their Globus service login and their Globus shared endpoint while FAS Research Computing user has to follow the steps described in this document to gain access to the Globus service.

Globus is a 3rd party service that FASRC and other universities use as a nexus to share and transfer data from/to their filesystems, it is not run by FASRC.

Using Globus Service To Transfer Data In or Share Data Out

  1. Familiarize yourself with the Globus How To2 documentation.
    Please note up front: If your data contains directories with hundreds or thousands of files, you will need to tar those directories up into subset files. (Type man tar at the command line to view the manual page for tar.)  Too many files in a single directory, while generally never a good idea, will cause Globus to go into a ‘endpoint is too busy’ state and your job will timeout, restart, timeout, etc.
    It is also not recommended to create a any single tar file larger than a few GB in size (and not larger than about 100GB). If your transfer restarts, it will restart at the beginning of that file.
  2. Login into the Globus web interface3, selecting Harvard University as your organization which will allow you to log in using Harvard Key, and land on File Manager page.4*
    If you are unable to complete this step, please contact HUIT to ensure your HarvardKey is current/enabled.
  3. FAS RC has two endpoints based on data centers and use-case
    1).  Holyoke endpoint is “Harvard FAS RC Holyoke” for Holyoke lab shares
    2). Boston endpoint is “Harvard FAS RC Boston” for Boston lab shares.
    Type one of these names based on the storage location. Click the endpoint and you will be asked to authenticate your FASRC account.
    Click Continue to do so.
  4. Enter your FAS RC Username and your FAS RC Verification Code5 (your OpenAuth toke code, not your password) when prompted to do so.

If you have difficulty connecting at any point in the process try again while on the RC VPN as certain steps require connectivity to our internal networks.

Globus Docs and Videos

Globus docs: http://www.globus.org | How to get started | Transfer and Sharing

Globus also provides helpful videos for common issues: https://www.globus.org/videos

Using the Harvard FAS RC Holyoke or Boston Endpoints

  • Please be aware that you can share out files or folders that you own or have explicit permission to. Please be careful about sharing too broadly.
  • To share out files in disparate directories, you should create a directory to copy those files to.
  •  The Holyoke and Boston endpoints are setup such that they will only see your lab folder’s “Lab” and “Users/$User” sub directories. The TRANSFER directory is no longer used. If you find that your lab is not seen by Globus or these sub directories don’t exist in your lab’s folder, in the Holyoke or Boston endpoints, please contact us.
  • Globus will use your RC account’s permissions the same as if you were accessing your lab storage from a node on the cluster. This dictates what Globus can see and not see when inside the Globus File manager, transferring data, or sharing data.

Sharing Data Examples

Example 1
To transfer files between the transfer folder and an external collaborator’s Globus shared endpoint, type the external collaborator’s shared endpoint name in the other Endpoint field on the File Manager page,4 select the files to transfer, specify the transfer settings and initiate the transfer by clicking on the large arrow button.6
Example 2
Share out a subfolder located in your transfer folder by creating a Guest Connection7 and authorizing external collaborator’s Globus account to access your shared endpoint. Let the collaborator know the shared endpoint’s name — the collaborator will then be able to initiate file transfers to/from your shared endpoint.
Example 3
To transfer files to/from a laptop or a desktop machine, set up a shared endpoint on it by installing Globus Connect Personal software8 and then connect to that shared endpoint the same way you would connect to an external collaborator’s shared endpoint — as described in the Example 1. Premium Globus account is not required to transfer files between an institutional (FAS RC) shared endpoint and the Globus Connect Personal software created endpoint.

Installing Globus Personal Endpoint on your computer

To share data from or to your local machine with other Globus users (not just with FASRC), you can install and run a personal endpoint on your computer.

Using Globus With Tier3 Tape

Please see https://docs.rc.fas.harvard.edu/kb/tape-globus-access/

Are there any limits on using the file transfer service?

See this Globus doc


Additional Documentation

  1. http://www.globus.org ↩
  2. http://docs.globus.org/how-to/get-started/ ↩
  3. https://docs.globus.org/faq/transfer-sharing/ ↩
  4. http://globus.org/login ↩
  5. http://docs.globus.org/how-to/get-started/#the_transfer_files_page ↩ ↩
  6. FAS RC Verification Code is the OpenAuth token generated code associated with your FAS RC user account. The same token is also used as the second factor for FAS RC VPN and FAS RC login nodes authentication. ↩
  7. http://docs.globus.org/how-to/get-started/#request_a_data_transfer ↩
  8. http://docs.globus.org/how-to/share-files/ ↩
  9. http://www.globus.org/globus-connect-personal ↩
]]>
15555
Transferring files to DATA.RC https://docs.rc.fas.harvard.edu/kb/data-rc/ Mon, 08 Jun 2015 16:22:16 +0000 https://rc.fas.harvard.edu/?page_id=13526 Users of the data.rc.fas.harvard.edu server have three options for doing so. Please note that if you choose option 2, the settings for FTP/S are not the same as for regular SFTP settings you might use to transfer files to other servers.

The connection methods in order of preference:

  1. Via Web browser – This is the default means of accessing data.rc

     

  2. Filezilla (If you don’t have Filezilla installed, download here)
    • Open ‘Site Manager’ (screenshot) in Filezilla and create a New Site
      host: data.rc.fas.harvard.edu
      Protocol: SFTP
      Logon Type: Interactive
      User: [your RC username]
    • IMPORTANT
      Click the Transfer tab and check the Limit number of simultaneous connections box and set Maximum number of connections to “1”.(screenshot)
    • Click Connect to connect now, or OK if you’re setting up for later use

     

  3. Map a drive to the share (Drive mapping instructions).
    Note: This may be the easiest method, but it is also often the slowest method.
    \\rcstore02.rc.fas.harvard.edu\data (Windows)
    smb://rcstore02.rc.fas.harvard.edu/data (Mac)
]]>
13526
SFTP file transfer using Filezilla – Filtering https://docs.rc.fas.harvard.edu/kb/sftp-file-transfer-filtering/ Mon, 15 Sep 2014 12:31:09 +0000 https://rc.fas.harvard.edu/?page_id=12170 There may be times when you wish to filter the file listing in the local or remote pane. If you need to do this often, you may want to set up a filter. Unlike the search feature (binoculars icon), filters modify what is shown in the Remote Site: or Local Site: pane.
If you simply need to see files grouped together by name, date modified, filesize, etc. you do not need to use a filter, you can sort on those criteria using the attributes at the top of the file listing. Example: To sort based on date modified, click Last Modified. Click it again to reverse the sort (ascending/descending).
filezilla_filter_1
Warning-iconA NOTE ABOUT FILTERS: One of the pitfalls to using filters is forgetting they are enabled. Keep in mind that if you open up a session and files seem to be missing or oddly sorted, you may have left a filter engaged. Simply open Filename Filters and disable the filter to return to normal.

CREATING/EDITING A FILTER IN FILEZILLA

To create a filter, select View then Filename Filters from the main menu (or click its icon, 4th from the left of the ‘Search’ binoculars) to open the Directory Listing Filters window. Note that filter rules can be applied to either pane (local or remote).
Click Edit filter rules to create a new filter or edit an existing one.
filezilla_filter_2
Click New to add a new filter rule (or select an existing one if you wish to edit).
Give your new rule a name that will make sense to you later.
filezilla_filter_3
Set the criteria for your filter. You can add multiple conditions. In the example shown, only files and folders which begin with ‘Resource’ will be shown. I’ve also chosen to make the filter case-sensitive.
CAUTION: If you plan to change directories/folders with a filter enabled, you will likely want to not check the Directories box so that you can still see the directory structure. Otherwise, they may also be filtered out and you’ll have to turn the filter off in order to change directories.
filezilla_filter_4
Click OK to save the filter. You can now enable this new filter rule from the Directory listing filters window. Simply tick its check box (on whichever side you wish to apply it) and click OK to engage the filter.
filezilla_filter_5
Warning-iconCAUTION: It’s easy to forget you have a filter engaged. If you create or use filter rules in Filezilla, then you should first check to see if any are enabled if a directory/file listing does not look right or you don’t see files you expected to see.

]]>
12170
SFTP file transfer using Filezilla (Mac/Windows/Linux) https://docs.rc.fas.harvard.edu/kb/sftp-file-transfer/ Wed, 10 Sep 2014 12:06:46 +0000 https://rc.fas.harvard.edu/?page_id=12102 Filezilla is a free and open source SFTP client which is built on modern standards. It is available cross-platform (Mac, Windows and Linux) and is actively maintained. As such Research Computing is recommending its use over previous clients, especially as it does not have some of the quirks of clients like Cyberduck or SecureFX. This document will outline setting up a bookmark in Filezilla to connect to the cluster or other RC file resources you have access to. NOTE: If your SFTP session constantly disconnects after several seconds, see this FAQ entry.

Download and Install

First you will need to download and install the Filezilla client You can download the latest version from Filezilla-project.org NOTE: Please download from this page and not the big green button so as to avoid bundled adware. Linux users may be able to install Filezilla using their respective package manager.


IMPORTANT: If you have never logged into the cluster before, please insure you’ve gone through the setup process and set up your OpenAuth token before proceeding.

STEP 1

Once installed, launch Filezilla and click the Site Manager icon in the upper left to begin setting up a connection bookmark for future use.

STEP 2

Click New Site to add a new bookmark. Enter the connection details in the General tab.

  • Host:
    • If you are connecting to Cannon, enter login.rc.fas.harvard.edu
    • If you are connecting to FASSE, enter fasselogin.rc.fas.harvard.edu
  • Protocol: select SFTP – SSH File Transfer Protocol
  • Login Type: select Interactive (this is crucial, otherwise you will not be prompted for your OpenAuth token)
  • User: enter your RC account username
  • In newer versions of Filezilla, the password box will not exist, and in older versions of Filezilla it will be greyed out because we’re using Interactive login, which will instead prompt you for a password when you click Connect
  • Now click the Transfer tab

STEP 3

IMPORTANT Click the Transfer tab and check the Limit number of simultaneous connections box and set Maximum number of connections to “1”. Otherwise you will be prompted for your password and token each time the token expires and for every new simultaneous connection during file transfers.

OPTIONAL In the Advanced tab, select the local (i.e. – on your computer) directory/folder you’d like to start in when connecting. You can type this in or click the Browse button and find the directory you want. You can leave Default remote directory: blank if you simply wish to connect to your RC account’s home directory. Or, if you wish to connect to a specific directory (for instance, your lab’s shared storage or a particular folder in your home directory), you can enter this here.

Click Connect to initiate a connection. If you’re just making a bookmark for later, click OK. The first time you connect you will see a window titled “Unknown host key”. Check the “Always trust this host, add this key to the cache” box and click OK. This will store cluster key for future use.

STEP 4

A password prompt box will pop up. Enter your RC account password here.

  • Check “Remember password until FileZilla is closed”, otherwise it will prompt you periodically and interrupt transfers
  • Click OK

STEP 5

Another password box will pop up as. This is for your OpenAuth token. Enter the code shown in your OpenAuth token window (Or Google Authenticator or DUO Mobile, if you are using one of  alternative token generators) and click OK.

 STEP 6

You should now be connected to the cluster and see your local files in the left-hand pane and the remote files in the right-hand pane. You can drag and drop between them or drag and drop to/from file windows on your computer. When done, click the red X icon up top to disconnect.

ADVANCED TOPIC:  Filename filtering rules in Filezilla

]]>
12102
Transferring Data Externally https://docs.rc.fas.harvard.edu/kb/transferring-data/ Mon, 14 Jun 2010 20:28:38 +0000 http://rc-dev.rc.fas.harvard.edu/transferring-data/ There are different ways in which to transfer data to and from research computing facilities. The appropriate choice will depend on the size of your data, your need to secure it and also who you wish to share it with.

To copy the data to or from a location for yourself (or a collaborator who has a Research Computing account):

When sending data to a collaborator without an account on research computing systems:

  • For files (or folders) under 20GB in size that need to be sent to individuals please use the Accellion secure file transfer.
  • For large data sets and/or for access by external users, consider using Globus
  • For unsecured long-term publishing of data on the web contact rchelp@rc.fas.harvard.edu. We can make your data available (readable to the world) over a URL. Not recommended for very large data sets. If you wish to use this option, please let us know up the overall size up front.

Please contact rchelp@rc.fas.harvard.edu if your needs fall outside these directions.

]]>
5409
rsync https://docs.rc.fas.harvard.edu/kb/rsync/ Tue, 23 Mar 2010 20:07:21 +0000 http://rc-dev.rc.fas.harvard.edu/rsync/ Rsync is a fast, versatile, remote (and local) file-copying tool. It is famous for its delta-transfer algorithm, which reduces the amount of data sent over the network by sending only the differences between the source files and the existing files in the destination. It is available on most Unix-like systems, including the FAS RC cluster and Mac OS X.
The basic syntax is: rsync SOURCE DESTINATION where SOURCE and DESTINATION are filesystem paths.
They can be local, either absolute or relative to the current working directory, or they can be remote but prefixing something like USERNAME@HOSTNAME: to the front of them.
Unlike cp and most shell commands, a trailing / character on a directory name is significant — it means the contents of the directory as opposed to the directory itself.

Examples

  • As a replacement for cp — copying a single large file, but with a progress meter:
    rsync –progress bigfile bigfile-copy
  • Make a recursive copy of local directory foo as foo-copy:

    rsync -aAvx foo/ foo-copy/

    NOTE: Never use the capital -X option. Only the lowercase -x

    The trailing slash on foo-copy/ is optional, but if it’s not on foo/, the file foo/myfile will appear as foo-copy/foo/myfile instead of foo-copy/myfile.

  • Upload the directory foo on the local machine to your home directory on the cluster:
    rsync -avxz foo/ MYUSERNAME@login.rc.fas.harvard.edu:~/foo/

    This works for individual files, too, just don’t put the trailing slashes on them.

  • Download the directory foo in your home directory on the cluster to the local machine:
    rsync -avz MYUSERNAME@login.rc.fas.harvard.edu:~/foo .
  • Update a previously made copy of foo on the cluster after you’ve made changes to the local copy:
    rsync -avz –delete foo/ MYUSERNAME@login.rc.fas.harvard.edu:~/foo/

    The --delete option has no effect when making a new copy, and therefore can be used the previous example, too (making the commands identical), but since it recursively deletes files, it’s best to use it sparingly.

  • Update a previously made copy of foo on the cluster after you or someone else has already updated it from a different source:
    rsync -aAvz –update foo/ MYUSERNAME@login.rc.fas.harvard.edu:~/foo/

    The --update options has no effect when making a new copy, and can freely be specified in that case, also.

  • Make a backup of your entire linux system to /mnt/MYBACKUPDRIVE:
    rsync -a –exclude /proc/ –exclude /sys/ –exclude /tmp/ –exclude /var/tmp/ –exclude /mnt/ –exclude /media/ /mnt/MYBACKUPDRIVE

    Add additional --exclude options, if appropriate.
    See rdiff-backup for a better way of making backups.

Compression

If the SOURCE and DESTINATION are on different machines with fast CPUs, especially if they’re on different networks (e.g. your home computer and the FASRC cluster), it’s recommended to add the -z option to compress the data that’s transferred.
This will cause more CPU to be used on both ends, but it is usually faster.

File Attributes, Permissions, Ownership, etc.

By default, rsync does not copy recursively, preserve timestamps, preserve non-default permissions, etc.
There are individual options for all of these things, but the option -a, which is short for archive mode, sums up many of these (-rlptgoD) and is best for producing the most exact copy.
(-A (preserve ACLs), -X (preserve extended attributes), and -H (preserve hardlinks) may also be desired on rare occasions.)
Note that if you are copying files not owned by you, preserving file ownership only works if you are root at the destination. If you are copying between systems on different authentication infrastructures, and the user/group does not exist at the destination, the numeric id is used. If that numeric id corresponds to a different user/group, the files will appear to be owned by that other user/group. If the user/group does exist on the destination, and the numeric id is different, the numeric id changes accordingly. The option --numeric ids changes this behavior, but introduces some issues of its own, so is not recommended by default.

Updating a Copy

Rsync’s delta-transfer algorithm allows you to efficiently update copies you’ve previously made by only sending the differences needed to update the DESTINATION instead of re-copying it from scratch.
However, there are some addition options you will probably want to use depending on the type of copy you’re trying to maintain.
If you want to maintain a mirror, i.e. the DESTINATION is to be an exact copy of the SOURCE, then you will want to add the --delete option.
This deletes stuff in the DESTINATION that is no longer in the SOURCE
Be careful with this option!
If you incorrectly specify the DESTINATION you may accidentally delete many files.
See also the --delete-excluded option if you’re adding --exclude options that were not used when making the original copy.
If you’re updating a master copy, i.e. the DESTINATION may have files that are newer than the versions in SOURCE, you will want to add the --update option.
This will leave those files alone, not revert them to the older copy in SOURCE.

Progress, Verbosity, Statistics

  • -v
    Verbose mode — list each file transferred.
    Adding more vs makes it more verbose.
  • --progress
    Show a progress meter for each file transfer (not a progress meter for the whole operation).
    If you have many small files, this can significantly slow down the transfer.
  • --stats
    Print a short paragraph of statistics at the end of the session, like average transfer rate, total numbers of files transferred, etc.

Other Useful Options

  • --dry-run
    Perform a dry-run of the session instead of actually modifying the DESTINATION.
    Most useful when adding multiple -v options, especially for verifying --delete is doing what you want.
  • --exclude PATTERN
    Skip some parts of the SOURCE.
]]>
5400
Copying Data to the FASRC cluster using SCP or SFTP https://docs.rc.fas.harvard.edu/kb/copying-data-to-and-from-cluster-using-scp/ Tue, 05 Jan 2010 12:15:34 +0000 http://rc-dev.rc.fas.harvard.edu/copying-data-to-and-from-odyssey-using-your-securid-fob/ SCP (Secure Copy) From/To the cluster

We generally recommend using SCP to copy data to and from the cluster. It is available across the cluster (login nodes, interactive sessions, NoMachine or batch jobs). It’s usage is simple, but the order that file locations are specified is crucial. SCP always expects the ‘from’ location first, then the ‘to’ destination. Depending on which is the remote system, you will prefix your username and server to one of the locations. scp [username@server:][location of file] [destination of file] or scp [location of file] [username@server:][destination of file] Below are some examples of the two most common uses of SCP to copy to and from various sources.

Note: We use “~” in the examples. The tilde “~” is a Unix short-hand that means “my home directory”. So if user johnharvard uses ~/ this is the same as typing out the full path to his home directory (easier to remember than /n/home05/johnharvard/ ). You can, of course, specify other paths (ex. – /n/netscratch/my_lab/johnharvard/output/files.zip)

Copying Files From the FASRC cluster to Another Computer

From a terminal/shell on a FASRC node you’ll issue your SCP command, enter your the cluster password and OpenAuth token code.
scp johnharvard@login.rc.fas.harvard.edu:~/files.zip /home/johnharvard/
Password:
Enter PASSCODE:
files.zip 100% 9664KB 508.6KB/s 00:19

This copies the file files.zip from from your home directory (~ is a Unix shortcut for ‘my home directory’) on the cluster to the /home/johnharvard/ directory on the computer you issued the command from.

Copying Files From Another Computer to the FASRC cluster

From a terminal/shell on your computer (or another server or cluster) you’ll issue your SCP command, enter your the cluster password and OpenAuth token code.
scp /home/johnharvard/myfile.zip johnharvard@login.rc.fas.harvard.edu:~/
Password:
Enter PASSCODE:
files.zip 100% 9664KB 508.6KB/s 00:19

This copies the file files.zip from from the /home/johnharvard/ directory on the computer you issued the command on to your home on the cluster. While it’s probably best to compress all the files you intend to transfer into one file, this is not always an option.

To copy the contents of an entire directory, you can use the -r (for recursive) flag.
scp johnharvard@login.rc.fas.harvard.edu:~/mydata/ /home/johnharvard/mydata/
Password:
Enter PASSCODE:
files.zip 100% 9664KB 508.6KB/s 00:19

This copies all the files from ~/mydata/ (~ is a Unix shortcut for ‘my home directory’) on the cluster to the /home/johnharvard/mydata/ directory on the computer you issued the command from.

SFTP From/To Your Computer to the FASRC cluster

See our guide for using Filezilla, a cross-platform FTP client, to transfer files using SFTP on Mac, Windows or Linux. You can transfer files to/from the cluster from your computer or any resources connected to your computer (shared drives, Dropbox, etc.) SFTP File Transfer Using Filezilla

NB: If you are using SecureFX, you cannot use the “wizard” and you must go to the SSH2 tab and check only “Keyboard Interactive” Authentication.

]]>
5338