Our compute cluster (or "grid") can be accessed only via the Grid Engine batch queuing system. Using GridEngine is unavoidably complicated and site-specific. This guide is designed to get you started quickly.
How to use the Grid
GridEngine won't run binary executables. It will only run scripts.
You have jobs to run, the grid has resources you need. Just tell GridEngine what you want, and let it do the rest.
Submit jobs to be run using the qsub command:
% qsub runme Your job 98 ("runme") has been submitted
Your script "runme" will be scheduled to run in the next available slot in the grid, with a 1 hour time limit.
Once your job is submitted, you can check on it with qstat:
% qstat job-ID prior name user state submit/start at queue slots ------------------------------------------------------------------------- 98 0.56000 runme jsb r 12/08/2010 15:35:39 short.q@ang21 1
When the job is finished, its standard output and error output will be found in your home directory (which is where the script ran):
% (cd; ls runme*) runme.e98 runme.o98
To find more about your script's execution, run qacct:
% qacct -j 98 ============================================================== qname short.q hostname ang21.cs.brown.edu group tstaff owner jsb project NONE department defaultdepartment jobname runme jobnumber 98 [etc...]
The output of qacct is quite long, and includes how much time and memory your job used, and lots of other information.
Running Many Jobs
qsub -t 0-99
No, that won't work. The range cannot start at zero.
You have hundreds or thousands of nearly identical jobs to run. A common case is a single program to be run on many datasets. You could call qsub over and over again, or you could use an array job:
% qsub -t 1-100 runme
In the example above, your script "runme" will be run 100 times. Each process will be passed a different value for the environment variable SGE_TASK_ID, from 1 to 100. In this way, you can vary the execution to suit your needs. For example, if you have already partitioned your data set into separate files, your script might look like this:
~/project/sim < ~/data/data.$SGE_TASK_ID> ~/results/out.$SGE_TASK_ID
Array jobs have a single job id, but each individual task produces its own standard output and standard error files.
Long Running and Large Memory Jobs
If you need more than one hour of run time, or if your job uses a lot of memory (more than 1GB), then you will need to request those resources.
Our grid puts all jobs into one of three categories: short, long and very long (vlong). Short jobs are the default and they will be killed if they run for more than one hour of wallclock time. Long jobs have up to 24 hours, and very long jobs can run forever.
Why would anyone use the default? Because long and very long jobs are limited to a fraction of the total grid slots at any one time. Only short jobs can fully populate the grid.
% qsub -l hour runme # or -l short (or no option) % qsub -l day runme # or -l long % qsub -l inf runme # or -l vlong
Your job will never be killed for using too much memory, but if you use a lot, you can avoid swapping by first requesting what you need. It's also good grid etiquette. To ensure you get 4GB of physical RAM:
% qsub -l vf=4G runme
Note that vf stands for "virtual free," but in our grid it is set to the total physical RAM in each machine. The request above will only be run on a machine with at least 4GB of unused main memory. This doesn't prevent jobs from competing for memory as they run, since it only affects job placement, but it certainly improves your chances of getting what you need.
Parallel Jobs and Benchmarking
You need simultaneous access to a number of machines, or to all cores on one machine, or you just need to ensure that your job is the only one running on a set of machines. You need to use a parallel environment.
The smp parallel environment is designed to give you access to multiple cores on each machine. If your program is multi-threaded, and you want it to have 2 cores, you might run it this way:
% qsub -pe smp 2 runme
That will ensure that the process gets two job slots on each machine on which it runs. GridEngine will also ensure that twice the memory (if you requested memory) is available.
Note that each machine has one job slot per core. Jobs can spawn more threads, or processes, than slots requested, without penalty. But requesting multiple cores, when needed, is in everyone's interest to keep grid resources from being oversubscribed.
Applications that use Message Passing Interface (MPI) consist of multiple tasks that rely on a communication infrastructure. The orte (Open Run-Time Environment) parallel environment supports Open MPI applications.
% qsub -pe orte 4 runme
In the example above, the runme script calls mpirun to start the tasks, which GridEngine distributes to 4 separate machines.
You want your parallel tasks to run on some number of separate machines. Without a well-defined framework, such as MPI (above), you'll need to do a little more work.
[more explanation once I figure this out]
You need exclusive use of a machine for benchmarking. The easiest way to do this is to request all slots on the machine. Since we have a variety of machine types, you'll have to be explicit about which machines you want (see the grid page for hardware details).
% qsub -pe smp 2 -q '*@@ang' runme
In this example, your job will only run on machines in the '@ang' host group, which are all dual-core machines, and your job will be the only job running.
There is an obvious mapping between host groups and machines. To list all host groups:
% qconf -shgrpl
Machines with GPUs are accessible only by requesting the gpus resource. When requested, a job setup script chooses idle GPUs and assigns them to the job. The Nvidia CUDA library automatically sees only the allocated GPUs. GPU jobs may share a machine, but will always have exclusive access to requested GPU resources.
This is the command to run a job that will use two GPUs:
% qsub -l gpus=2 runme
Some GPUs have more memory than others. The GTX-class cards have between 8 and 11G of VRAM. A smaller number GPUs have more (see the grid resources page for details). To gain access to the GPUs with more VRAM, request the gmem resource:
% qsub -l gpus=2 -l gmem=24 runme
Running an interactive session on a grid machine is strongly discouraged, but sometimes unavoidable. Interactive sessions are available only for short (1 hour) or long (1 day) jobs. Very long jobs must be batch jobs.
% qlogin # or... % qrsh
These commands accept the same options that qsub accepts. Only qlogin provides X11 port forwarding.
Before you unleash 1000 jobs on the grid, you want to quickly test and make sure things are working properly, but the grid is so busy! You want to run a test.
% qsub -l test runme
Tests run at a high priority, ahead of other waiting jobs. The caveat is that they can only run for up to 10 minutes, and they are limited to one slot per machine. Don't abuse this!
- Current Working Directory
- To ensure that your job runs in the directory from which you submit it (and to ensure that it's standard output and error files land there) use the -cwd option:
% qsub -cwd runme
- Running Now
- If you want GridEngine to run your job now or else fail, give it the -now option:
% qsub -now y runme
- Embedding Options
- You don't have to remember all the qsub options you need for every job you run. You can embed them in your script:
% cat runme #!/bin/sh # # Execute from the current working directory #$ -cwd # # This is a long-running job #$ -l inf # # Can use up to 6GB of memory #$ -l vf=6G # ~/project/sim
With all the options in the script, executing it is simple:
% qsub runmeYou can, of course, still use command-line arguments to augment or override embedded options.
- Mail Notification
- To receive email notifications about your job, use the "-m" option:
% qsub -m as runmeIn the example above, you will get mail if the job aborts or is suspended. The mail options are:
a - abort b - begin e - exit s - suspend
Deleting Your Jobs
Deleting your submitted jobs can be done with the qdel command:
% qdel job-id The specified job-id is deleted.
% qdel -u username All the jobs by usrename are deleted.
Users can only delete their own jobs.
The man pages for gridengine commands are surprisingly helpful.
If you are using the grid, you must subscribe to the compute mailing list. All grid-related announcements are posted to this list only. You can also ask questions and coordinate grid usage there.
The grid is a supported department resource, so you can also mail problem if you need help.