Data Transfer#

You should make sure any input data required is on your scratch directory before the job starts. If you need to transfer data elsewhere after a job completes, the job should save the data in the scratch directory, and then you can transfer it as a separate task after the job finishes.

Warning

You should not transfer data in and out of Aire from running jobs. This ties up the compute nodes waiting for the network and is inefficient.

The login nodes on Aire are powerful and have fast connections to the campus network and onward to the JANET network and other universities. The standard Linux tools are available on the login nodes to transfer data to and from the HPC system. Useful commands are scp and rsync. You can transfer single files or sets of files, while directories can be copied, it can be better to compress files into a single file and transfer that file. This can be achieved using the zip command. The login nodes also accept inbound connections for these utilities from other machines on campus (wired connection), such as your desktop or workstation or departmental servers and storage.

For detailed instructions on data transfer, please refer to the following KB article:

Note that the above articles require you to log in with your University account to view.

SCP#

Due to the authentication methods required to access Aire, some standard scp clients can be cumbersome as they require repeated authentication during transfer. For a smoother experience, we recommend using MobaXterm on Windows, or CyberDuck or ForkLift on Mac, which handle authentication more efficiently and provide user-friendly interfaces for file transfers. For Linux, using the scp command via the terminal is the most straightforward. Please see the KB article linked above for further information on using scp on Aire.

Globus#

Globus enables you to quickly, securely and reliably move your data (in particular, large files) to and from locations you have access to, using GridFTP protocol optimized for high-bandwidth wide-area networks. We are currently working to add Globus centrally to Aire.

Globus Personal provides an effective interim solution for file transfers to/from Aire to locations such as University-managed Research IT Storage (resstore: Research Data Storage Service Provision) while we work towards enabling the central Globus client infrastructure. The personal client allows users to make both their Aire home directory and $SCRATCH visible to Globus, enabling efficient data transfers between Aire and Globus-enabled endpoints such as resstore.

Warning

You cannot transfer files between two instances of Globus personal without a subscription; you must connect between an instance of Globus personal and a Globus client endpoint.

This means that at the moment (until we have the central client enabled on Aire):

  • You can transfer files between Globus Personal on Aire and Globus endpoints such as resstore;

  • You can transfer files between Globus Personal on Aire and Globus endpoints such as OneDrive;

  • You cannot transfer files between Globus Personal on Aire and Globus Personal on your PC or laptop (without a subscription);

  • You can transfer files between Globus Personal on Aire and Globus endpoints such as OneDrive/resstore, and then between OneDrive/resstore; and Globus Personal on your PC or laptop.

If you want to connect Globus to your OneDrive account, you will need to request approval, please see this KB article.

In addition to the specific installation instructions provided below for Aire, you will also find the Knowledge Base articles linked below useful for setting up Globus and accessing your storage.

Installing Globus Personal on Aire#

The following guidance has been adapted from the Globus documentation (Linux installation instructions):

  1. After logging in to Aire, download Globus:

    $ wget https://downloads.globus.org/globus-connect-personal/linux/stable/globusconnectpersonal-latest.tgz
    
  2. Extract the tarball:

    $ tar xzf globusconnectpersonal-latest.tgz
    # this will produce a versioned globusconnectpersonal directory
    # replace `x.y.z` in the line below with the version number you see
    $ cd globusconnectpersonal-x.y.z
    
  3. Run Globus personal to complete set-up without a GUI:

    $ ./globusconnectpersonal -setup --no-gui
    

    This will launch Globus, and your terminal should provide you with a URL to visit on your local machine to complete set-up (including University of Leeds SSO); you will then receive a key to copy and past back into the command line on Aire. Please see the Globus documentation for further details.

  4. You can close Globus once set-up is complete.

  5. Modify or create the file config-paths (assuming you are still in the folder globusconnectpersonal-x.y.z) with your favourite text editor (this command will create the file if it doesn’t already exist):

    $ nano ~/.globusonline/lta/config-paths
    

    This allows us to edit Globus permissions to various file paths. The config-paths file is a headerless CSV with the following content:

    <path>,<sharing flag>,<R/W flag>
    

    In your file, you’ll see:

    ~/,0,1
    

    Which provides access to your home directory (~/), doesn’t allow sharing (0 as the sharing flag), and and allows read/write access (1 as the R/W flag).

    You can add $SCRATCH with the same permissions by adding the following line to the file and saving:

    $SCRATCH,0,1
    

    Read more about Managing Globus Connect Personal Directory Permissions via the Config File in the official documentation.

Running Globus Personal on Aire#

  1. Please read the Globus webapp documentation and ensure your Globus endpoints are visible under “Connections” from the webapp. Your newly configured Aire collection should also be present, but will show the status “offline”.

  2. From Aire, run Globus with nohup:

    $ ./globusconnectpersonal -start &
    

    If you refresh the webapp, you should now see your Aire collection as “online”. Because we used &, this will continue to run even when you log out of Aire, making disruption-free transfers easier. Note that if you edit any configuration etc. you will need to stop and restart Globus:

    $ ./globusconnectpersonal -stop
    $ ./globusconnectpersonal -start &
    
  3. Using the “File Manager” tab on the left of the screen, select Aire as a collection. By default, the path is to your home directory, however if you made $SCRATCH visible as per the installation instructions, you can also enter a path to a directory in this space: /~/mnt/scratch/<USERNAME>/some_directory.

  4. Using the UI, you can now transfer data across between Aire and another endpoint.

Relevant Globus Knowledge Base Articles#