Storage on HPC#

HOME directory#

When you login to ARC3 or ARC4 you automatically start in your HOME directory. Your HOME directory is backed up weekly and is shared across both HPC systems. Your HOME directory has a quota of 10GB which is not a lot of space. Exceeding your quota can result in errors and job failures.

Check HOME directory usage with the quota command:

$ quota -s
Disk quotas for user exuser:
     Filesystem   space   quota   limit   grace   files   quota   limit   grace
nas-ufaservn1:/export/home/home01
                  7897M  10240M  11264M           25194       0       0

The -s argument provides a more human readable output. The example output shows that exuser has used 7897MB (7.9GB) of space, the quota is 10240MB (10GB), the hard limit is 11264MB (11GB) and there are 25194 files.

/nobackup#

Each HPC system has a different /nobackup directory. /nobackup is constructed using the Lustre parallel filesystem. ARC3 has ~836TB at 4GB/s and ARC4 has ~1.2PB at 11GB/s. ARC3 has 3191616896 (3191 Million) Inodes, ARC4 has 936379680 (936 Million) Inodes. Each file and directory requires an Inode.

Only a small fraction of the storage capacity and Inodes is generally available for any job/user.

Some commonly used input data used by groups of users are stored on /nobackup.

Users should only store data needed for current projects and processing on /nobackup. If you estimate and intend to use more than 1 TerraByte of filespace on /nobackup on ARC3 or ARC4 and/or use more than 1 Million Inodes, please liaise with the RSE team and your supervisor before staging your jobs (data tranfer and submitting to the scheduler). In general, please liaise with your supervisor about your HPC data management, processing and transfer.

Whilst the file system works more efficiently with fewer numbers of files and with plenty of capacity (more then 30% available), it can also be costly to have a lot of data transfer to/from /nobackup.

On the HPC systems, processing workflows are more efficient if they use fewer numbers of larger files rather than larger numbers of smaller files. Creating, storing and reading large numbers of files on /nobackup can be problematic, so you are encouraged to develop processing workflows that only create, store and read small numbers of files.

In general, it is best to scale up gradually to process larger amounts of data and to benchmark (record how long things take and how much resource they require as you scale up). Understanding how much data storage and how many files you will be creating can be as important as understanding how much memory and processing time is required for different amounts of processing cores/nodes.

You can check your quota on /nobackup use the following lfs quota command:

$ lfs quota -h /nobackup

Warning


/nobackup is **not** backed up and files are removed if not accessed after 90 days.

To ensure sufficient space and bandwidth is available, files on /nobackup are purged periodically.

Accessing /nobackup#

To access the /nobackup use the cd command:

$ cd /nobackup

Users are encouraged to keep files on /nobackup in a directory that has the same name as your USERNAME. If your USERNAME were exuser, the following would create the directory and change into the directory:

$ mkdir /nobackup/exuser
$ cd /nobackup/exuser

Details about data transfer to/from /nobackup

Sharing files#

There are 3 permissions: read (r), write (w) and execute (x). There are 3 levels of permission: owner, group and all. The default permissions on ARC3 are different to ARC4: On ARC3 group and all have read and execute permissions on new files, whereas on ARC4 group and all have no permissions. Permissions, group and ownership can be changed with chmod, chgrp and chown commands.

Users belong to groups, the groups command can be used to check what groups a user is in. File/directory permissions can be organised for groups. Only system administrators can create new groups and add users to groups.

Users can share files/directories with other users and specify permissions using the setfacl command.

The getfacl command can be used to report the access controls for specific files/directories.

The setfacl command can be run to recurse through subdirectories and can be used to set default access controls on directories created.

Suppose you want a /nobackup directory on ARC4 that another user can read and execute files in. The following commands will do that:

$ cd /nobackup
$ mkdir /nobackup/exdir
$ cd exdir
$ setfacl -m u:exuser:r-x .

(Replace “exdir” with an appropriate directory name and “exuser” with the USERNAME of the user that you want to share the directory with.)

You can check what access controls there are using:

$ cd /nobackup/filename
$ getfacl .

If there is an existing directory with subdirectories and you want to set the access control to recurse and provide also write permissions in subdirectories, use the following:

$ cd /nobackup/exdir
$ setfacl --recursive -m u:exuser:rwx .

To change the default so that any new files and directories also have these permissions, use the following:

$ cd /nobackup/exdir
$ setfacl --recursive -m u:exuser:rwx,d:u:exuser:rwx .

/resstore#

Another location where data is stored for processing on HPC systems is /resstore. If you need to use data stored in /resstore then you will need adding to a specific group.