Data Transfer#
You should make sure any input data required is on your scratch directory before the job starts. If you need to transfer data elsewhere after a job completes, the job should save the data in the scratch directory, and then you can transfer it as a separate task after the job finishes.
Warning
You should not transfer data in and out of Aire from running jobs. This ties up the compute nodes waiting for the network and is inefficient.
The login nodes on Aire are powerful and have fast connections to the campus network and onward to the JANET network and other universities. The standard Linux tools are available on the login nodes to transfer data to and from the HPC system. Useful commands are scp
and rsync
. You can transfer single files or sets of files, while directories can be copied, it can be better to compress files into a single file and transfer that file. This can be achieved using the zip
command. The login nodes also accept inbound connections for these utilities from other machines on campus (wired connection), such as your desktop or workstation or departmental servers and storage.
For detailed instructions on data transfer, please refer to the following KB article:
Note that the above articles require you to log in with your University account to view.
SCP#
Due to the authentication methods required to access Aire, some standard scp clients can be cumbersome as they require repeated authentication during transfer. For a smoother experience, we recommend using MobaXterm on Windows, or CyberDuck or ForkLift on Mac, which handle authentication more efficiently and provide user-friendly interfaces for file transfers. For Linux, using the scp
command via the terminal is the most straightforward. Please see the KB article linked above for further information on using scp on Aire.
Globus#
Globus enables you to quickly, securely and reliably move your data (in particular, large files) to and from locations you have access to, using GridFTP protocol optimized for high-bandwidth wide-area networks. We are currently working to add Globus centrally to Aire.
Globus Personal provides an effective interim solution for file transfers to/from Aire to locations such as University-managed Research IT Storage (resstore
: Research Data Storage Service Provision) while we work towards enabling the central Globus client infrastructure. The personal client allows users to make both their Aire home directory and $SCRATCH
visible to Globus, enabling efficient data transfers between Aire and Globus-enabled endpoints such as resstore
.
Warning
You cannot transfer files between two instances of Globus personal without a subscription; you must connect between an instance of Globus personal and a Globus client endpoint.
This means that at the moment (until we have the central client enabled on Aire):
You can transfer files between Globus Personal on Aire and Globus endpoints such as
resstore
;You can transfer files between Globus Personal on Aire and Globus endpoints such as OneDrive;
You cannot transfer files between Globus Personal on Aire and Globus Personal on your PC or laptop (without a subscription);
You can transfer files between Globus Personal on Aire and Globus endpoints such as OneDrive/
resstore
, and then between OneDrive/resstore
; and Globus Personal on your PC or laptop.
If you want to connect Globus to your OneDrive account, you will need to request approval, please see this KB article.
In addition to the specific installation instructions provided below for Aire, you will also find the Knowledge Base articles linked below useful for setting up Globus and accessing your storage.
Installing Globus Personal on Aire#
The following guidance has been adapted from the Globus documentation (Linux installation instructions):
After logging in to Aire, download Globus:
$ wget https://downloads.globus.org/globus-connect-personal/linux/stable/globusconnectpersonal-latest.tgz
Extract the tarball:
$ tar xzf globusconnectpersonal-latest.tgz # this will produce a versioned globusconnectpersonal directory # replace `x.y.z` in the line below with the version number you see $ cd globusconnectpersonal-x.y.z
Run Globus personal to complete set-up without a GUI:
$ ./globusconnectpersonal -setup --no-gui
This will launch Globus, and your terminal should provide you with a URL to visit on your local machine to complete set-up (including University of Leeds SSO); you will then receive a key to copy and past back into the command line on Aire. Please see the Globus documentation for further details.
You can close Globus once set-up is complete.
Modify or create the file
config-paths
(assuming you are still in the folderglobusconnectpersonal-x.y.z
) with your favourite text editor (this command will create the file if it doesn’t already exist):$ nano ~/.globusonline/lta/config-paths
This allows us to edit Globus permissions to various file paths. The
config-paths
file is a headerless CSV with the following content:<path>,<sharing flag>,<R/W flag>
In your file, you’ll see:
~/,0,1
Which provides access to your home directory (
~/
), doesn’t allow sharing (0
as the sharing flag), and and allows read/write access (1
as the R/W flag).You can add
$SCRATCH
with the same permissions by adding the following line to the file and saving:$SCRATCH,0,1
Read more about Managing Globus Connect Personal Directory Permissions via the Config File in the official documentation.
Running Globus Personal on Aire#
Please read the Globus webapp documentation and ensure your Globus endpoints are visible under “Connections” from the webapp. Your newly configured Aire collection should also be present, but will show the status “offline”.
From Aire, run Globus with
nohup
:$ ./globusconnectpersonal -start &
If you refresh the webapp, you should now see your Aire collection as “online”. Because we used
&
, this will continue to run even when you log out of Aire, making disruption-free transfers easier. Note that if you edit any configuration etc. you will need to stop and restart Globus:$ ./globusconnectpersonal -stop $ ./globusconnectpersonal -start &
Using the “File Manager” tab on the left of the screen, select Aire as a collection. By default, the path is to your home directory, however if you made
$SCRATCH
visible as per the installation instructions, you can also enter a path to a directory in this space:/~/mnt/scratch/<USERNAME>/some_directory
.Using the UI, you can now transfer data across between Aire and another endpoint.
Relevant Globus Knowledge Base Articles#
Getting started with Globus data transfer service: this article introduces Globus and signposts Globus documentation. This KB article also links to the Globus Data Transfer Service request form, to enable Globus on pre-existing University Storage.
How to log into Globus: this article shows you how to authorise Globus Web to use your University of Leeds account.
Data transfer between Globus Collections, OneDrive, Microsoft Teams and SharePoint sites: how to connect various University storage with Globus.
Information about Research Data Storage Service Provision:available research storage (Globus enabled).