Using Slurm

Quickstart

How to connect to Hydra and submit your first batch job.

Connecting to Hydra

The simplest way to connect to Hydra is through the ssh.cs.brown.edu gateway or the fastx cluster. You need to set up your ssh keypair in order to use the ssh gateway or fastx cluster.

Submit a compute job

You can submit a job using sbatch:

$ sbatch batch_scripts/hello.sh

You can confirm that your compute job ran successfully by running:

$ cat batch_scripts/slurm-<job id>*.out

By default, your job is submitted to the compute partition and will run for 1hr if you don't specify the partion name or run time limit.

Submit a gpu job

To submit a gpu job, you must use the gpus partition and request a gpu in your request.  Use sbatch with the following options to submit a gpu job.

$ sbatch --partition=gpus --gres=gpu:1 gputest.sh

You can confirm that your compute job ran successfully by running:

$ cat slurm-<job id>*.out

The gpus partition contains all the gpu hardware down in the CIT datacenter. You must request at least 1 gpu resource in order to run a gpu job on the gpus partition.

Showing the job queue

To see the job queue, use the squeue command.

Cancel a job

To cancel your job, use the scancel command i. e. scancel <job id>.

Using slurm options in a script

The script you submit to slurm can contain slurm options in it.  Here is a simple template, batch.script, to use for that:

#!/bin/bash
# This is an example batch script for slurm on Hydra
#
# The commands for slurm start with #SBATCH
# All slurm commands need to come before the program # you want to run. In this example, 'echo "Hello World!"
# is the command we are running.
#
# This is a bash script, so any line that starts with # is # a comment. If you need to comment out an #SBATCH line, use # infront of the #SBATCH
#
# To submit this script to slurm do:
# sbatch batch.script
#
# Once the job starts you will see a file MySerialJob-****.out
# The **** will be the slurm JobID
# --- Start of slurm commands -----------

# set the partition to run on the gpus partition. The Hydra cluster has the following partitions: compute, gpus, debug, tstaff
#SBATCH --partition=gpus

# request 1 gpu resource
#SBATCH --gres=gpu:1
 

# Request an hour of runtime. Default runtime on the compute parition is 1hr.
#SBATCH --time=1:00:00
# Request a certain amount of memory (4GB):
#SBATCH --mem=4G
# Specify a job name:
#SBATCH -J MySerialJob
# Specify an output file
# %j is a special variable that is replaced by the JobID when the job starts
#SBATCH -o MySerialJob-%j.out #SBATCH -e MySerialJob-%j.out
#----- End of slurm commands ----
# Run a command
echo "Hello World!"

Slurm Training

Slurm training is available the CCV slurm workshop. Go to the CCV Help page for details.