Job Examples#

Included here are a series of template job submissions, that may prove useful as a starting point for your own submission scripts.

Serial#

A fairly minimal example to run a serial code, using a default allocation of 1Gbyte of RAM, running for up to an hour, executing a binary example.bin in the current directory, and importing the environment from the submitting shell.

#$ -cwd -V
#$ -l h_rt=01:00:00
./example.bin

Serial, but needs more memory#

As above, but asking for additional memory, so 4.8Gbytes of RAM, which is the amount that balances with available cores on machines on ARC4.

#$ -cwd -V
#$ -l h_rt=01:00:00
#$ -l h_vmem=4.8G
./example.bin

Serial, but needs lots more memory#

If we find ourselves needing lots of memory per core (say above 10G) you may be better off targeting the large memory nodes. Here we’re asking for 200G, so we have to use the large memory nodes, as the normal nodes only have 192G available on ARC4.

#$ -cwd -V
#$ -l h_rt=01:00:00
#$ -l node_type=40core-768G
#$ -l h_vmem=200G
./example.bin

Threaded#

Here we’re asking for eight cores. Where we want to use the number of cores we’ve been allocated, we can use $NSLOTS, which ensures we don’t accidentally use a different figure in two places in the job submission file.

#$ -cwd -V
#$ -l h_rt=01:00:00
#$ -pe smp 8
./example.bin -t $NSLOTS

Threaded, but gets upset by OpenMP hints#

The clues given by the scheduler to help with OpenMP codes can confuse some codes, which are technically OpenMP aware, but are being used in ways that don’t sit well with the hints. This removes the hints, allowing the Linux kernel to assign free cores as it sees fit.

#$ -cwd -V
#$ -l h_rt=01:00:00
#$ -pe smp 8
unset GOMP_CPU_AFFINITY KMP_AFFINITY
./example.bin -t $NSLOTS

Threaded, using a whole node#

Sometimes you may just want a whole node. Run this way, you get all the cores, all the memory, all the local disk. If code needs to be told how many threads to use, it can be passed $NSLOTS.

#$ -cwd -V
#$ -l h_rt=01:00:00
#$ -l nodes=1
./example.bin -t $NSLOTS

MPI, small, single node#

If you’re wanting to run a small MPI job on a single node, you can run it as an smp job.

#$ -cwd -V
#$ -l h_rt=01:00:00
#$ -pe smp 8
mpirun example.bin

MPI, small, run anywhere#

If you’re wanting to run an MPI job, and you’re not sensitive to having all the processes on a single machine, you can ask for them to be spread anywhere across nodes. There’s no guarantee what the scattering is, so you may have 8 cores on one node, spread across eight nodes, or somewhere in between.

#$ -cwd -V
#$ -l h_rt=01:00:00
#$ -pe ib 8
mpirun example.bin

MPI, using a whole node#

Like with the threaded example, we can assign a whole node. mpirun itself doesn’t need to be told how many cores to run on, as it detects that automatically. This delivers much more consistent compute performance than that previous two options, since you’re the only job running on the nodes you’re using.

#$ -cwd -V
#$ -l h_rt=01:00:00
#$ -l nodes=1
mpirun example.bin

MPI, using more than one node#

Running on more than one node is identical to running on a single node. For more advanced examples, including mixed OpenMP/MPI codes, see the Advanced examples.

#$ -cwd -V
#$ -l h_rt=01:00:00
#$ -l nodes=3
mpirun example.bin

Using a GPU#

Using a GPU uses simplified syntax, and like asking for a whole node, you don’t need to ask for RAM/CPU/disk to go with the GPU. Our page on General Purpose GPU provides more details.

#$ -cwd -V
#$ -l h_rt=01:00:00
#$ -l coproc_v100=1
./example.bin