CS Hydra Cluster
The CS Hydra cluster contains debian bookworm(r12 ) and ubuntu noble(r24.04) nodes. Each os type has its own partition. Beside the two different os, there are two type of resources available in the cluster, a compute or gpu resource. Here are the paritions listing for the Hydra cluster.
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
compute* up infinite 1 mix echidna
compute* up infinite 29 idle smblade16a[1-14],smblade16b[1-8],smblade24a[1-6],typhon
dcompute up infinite 14 idle smblade16b[1-8],smblade24a[1-6]
dgpus up infinite 2 alloc gpu[2201,2301]
dgpus up infinite 20 idle gpu[1701-1708,1801-1802,1901-1907,2001-2003]
gpus up infinite 2 alloc gpu[2201,2301]
gpus up infinite 26 idle gpu[1601-1605,1701-1708,1801-1802,1901-1907,2001-2003,2501]
ucompute up infinite 1 mix echidna
ucompute up infinite 15 idle smblade16a[1-14],typhon
ugpus up infinite 7 idle gpu[1601-1605,1801,2501]
The dcompute and dgpus partitions are debian bookworm compute and gpu nodes, respectively. The ucompute and ugpus partitions are the ubuntu noble compute and gpu nodes, respectively. The compute and gpus partitions contain both debian and ubuntu compute and gpus, respectively.
Connecting to Hydra
The simplest way to connect to Hydra is through the ssh.cs.brown.edu gateway or the fastx cluster. You need to set up your ssh keypair in order to use the ssh gateway or fastx cluster.
Submit a compute job
You can submit a job using sbatch:
You can confirm that your compute job ran successfully by running:
By default, your job is submitted to the compute partition and will run for 1hr if you don't specify the partion name or run time limit.
Submit a gpu job
To submit a gpu job, you must use the gpus partition and request a gpu in your request. Use sbatch with the following options to submit a gpu job.
You can confirm that your compute job ran successfully by running:
The gpus partition contains all the gpu hardware down in the CIT datacenter. You must request at least 1 gpu resource in order to run a gpu job on the gpus partition.
Showing the job queue
To see the job queue, use the squeue command.
Cancel a job
To cancel your job, use the scancel command i. e. scancel <job id>.
Using slurm options in a script
The script you submit to slurm can contain slurm options in it. Here is a simple template, batch.script, to use for that:
# This is an example batch script for slurm on Hydra
#
# The commands for slurm start with #SBATCH
# All slurm commands need to come before the program # you want to run. In this example, 'echo "Hello World!"
# is the command we are running.
#
# This is a bash script, so any line that starts with # is # a comment. If you need to comment out an #SBATCH line, use # infront of the #SBATCH
#
# To submit this script to slurm do:
# sbatch batch.script
#
# Once the job starts you will see a file MySerialJob-****.out
# The **** will be the slurm JobID
# --- Start of slurm commands -----------
# set the partition to run on the gpus partition. The Hydra cluster has the following partitions: compute, gpus, debug, tstaff
#SBATCH --partition=gpus
# request 1 gpu resource
#SBATCH --gres=gpu:1
# Request an hour of runtime. Default runtime on the compute parition is 1hr.
#SBATCH --time=1:00:00
# Request a certain amount of memory (4GB):
#SBATCH --mem=4G
# Specify a job name:
#SBATCH -J MySerialJob
# Specify an output file
# %j is a special variable that is replaced by the JobID when the job starts
#SBATCH -o MySerialJob-%j.out #SBATCH -e MySerialJob-%j.out
#----- End of slurm commands ----
# Run a command
echo "Hello World!"
Slurm Training
Slurm training is available the CCV slurm workshop. Go to the CCV Help page for details.