Slurm on Talapas
Slurm is job scheduling software that apportions resources to jobs distributed across the hundreds of nodes on Talapas. To use Talapas, you must request resources and define job parameters in the form of Slurm jobs. This lesson introduces Slurm job configuration.
Lesson Setup
For this lesson, you will need to connect to a Talapas login node through a shell application of your choice.
For convenience, we recommend the Talapas OnDemand shell.
To start, make sure you are in your home directory.
cd ~
Copy the /projects/racs_training/intro-hpc-s25/slurm folder into your home directory. Don’t forget -r to recursively copy the contents!
cp -r /projects/racs_training/intro-hpc-f25/slurm .
Navigate inside the newly created slurm directory.
cd slurm
ls -F
part1/ part2/
Navigate inside the part1 folder using cd, then inspect its contents using ls.
cd part1
ls
gpu.sbatch hello.py hello.sbatch long.sbatch
Troubleshooting a Batch Slurm Job
Let’s inspect hello.sbatch.
This is an imperfect Slurm job: it’s missing an important step! We will start with what it does correctly, then fix the common mistake.
cat hello.sbatch
#!/bin/bash
#SBATCH --partition=compute ### Partition (like a queue in PBS)
#SBATCH --account=racs_training ### Account used for job submission
### NOTE: %u=userID, %x=jobName, %N=nodeID, %j=jobID, %A=arrayMain, %a=arraySub
#SBATCH --job-name=hello_world_python ### Job Name
#SBATCH --output=%x-%j.out ### File in which to store job output
#SBATCH --error=%x-%j.err ### File in which to store job error messages
#SBATCH --time=0-00:05:00 ### Wall clock time limit in Days-HH:MM:SS
#SBATCH --nodes=1 ### Number of nodes needed for the job
#SBATCH --mem=500M ### Total Memory for job in MB -- can do K/M/G/T for KB/MB/GB/TB
#SBATCH --ntasks-per-node=1 ### Number of tasks to be launched per Node
#SBATCH --cpus-per-task=1 ### Number of cpus/cores to be launched per Task
python hello.py
Let’s parse this file using the comments as a guide! To learn more, see the Slurm documentation for sbatch.
- This job runs for a maximum of five minutes. (
0-00:05:00) on one node and one CPU core of that node. - It requests 500MB of RAM.
- The job output will be written to
hello_world_python-[jobid].outandhello_world_python-[jobid].errrespectively. - It runs a file in the
slurm_examplesdirectory calledhello.pythat prints to standard output.
Given that this task is relatively trivial, these parameters are appropriate.
Inspect the hello.py script called by the batch script with cat.
cat hello.py
print("Hello Wold !!")
For those unfamiliar with Python, this is a trivial “Hello World” job that prints the phrase “Hello Wold !!” to standard out.
To submit your job file, run the sbatch command.
sbatch hello.sbatch
Submitted batch job 39456543
To check the status of your queued jobs, use the squeue command. Do not use squeue without an argument unless you want to see all the queued jobs on Talapas.
If you check your queue, you will probably see that this job has already terminated.
squeue --me
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
Because squeue only tracks jobs in progress, you will need to check your recent jobs regardless of status with sacct.
sacct
JobID JobName Partition Account AllocCPUS State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
39456543 hello_wor+ compute racs_trai+ 1 FAILED 127:0
39456543.ba+ batch racs_trai+ 1 FAILED 127:0
39456543.ex+ extern racs_trai+ 1 COMPLETED 0:0
The parent job, 39456543, failed. Check the error log using cat.
cat hello_world_python-*.err
/var/spool/slurm/job39456543/slurm_script: line 17: python: command not found
Remember modules? Python is not on your path on Talapas default. This means that the job on the compute node must load a version of Python before it can interpret the Python script.
A Corrected Batch Script
Correct the script by inserting two lmod statements: module purge and module load python3/3.11.4.
#!/bin/bash
#SBATCH --partition=compute ### Partition (like a queue in PBS)
#SBATCH --account=racs_training ### Account used for job submission
### NOTE: %u=userID, %x=jobName, %N=nodeID, %j=jobID, %A=arrayMain, %a=arraySub
#SBATCH --job-name=hello_world_python ### Job Name
#SBATCH --output=%x-%j.out ### File in which to store job output
#SBATCH --error=%x-%j.err ### File in which to store job error messages
#SBATCH --time=0-00:05:00 ### Wall clock time limit in Days-HH:MM:SS
#SBATCH --nodes=1 ### Number of nodes needed for the job
#SBATCH --mem=500M ### Total Memory for job in MB -- can do K/M/G/T for KB/MB/GB/TB
#SBATCH --ntasks-per-node=1 ### Number of tasks to be launched per Node
#SBATCH --cpus-per-task=1 ### Number of cpus/cores to be launched per Task
module purge # Best practice, unloads all modules
module load python3/3.11.4 # Loads Python
python hello.py
Remove the old error logs.
rm hello_world_python-*.err
rm hello_world_python-*.out
Resubmit the job with the corrected batch script.
sbatch hello.sbatch
Submitted batch job 39456578
Running sacct will show your job completed successfully.
sacct
JobID JobName Partition Account AllocCPUS State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
39456578 hello_wor+ compute racs_trai+ 1 COMPLETED 0:0
39456578.ba+ batch racs_trai+ 1 COMPLETED 0:0
39456578.ex+ extern racs_trai+ 1 COMPLETED 0:0
Check the output log to see the results of your job. Note that because each log is named for the jobid, and jobids are unique, your logs will not overwrite each other.
Check the contents of the output logs with cat.
cat hello_world_python*
Hello Wold !!
Managing Runnning Jobs
Inspect long.sbatch.
cat long.sbatch
#!/bin/bash
#SBATCH --partition=computelong ### Partition (like a queue in PBS)
#SBATCH --account=racs_training ### Account used for job submission
### NOTE: %u=userID, %x=jobName, %N=nodeID, %j=jobID, %A=arrayMain, %a=arraySub
#SBATCH --job-name=long_hello_world ### Job Name
#SBATCH --output=%x-%j.out ### File in which to store job output
#SBATCH --error=%x-%j.err ### File in which to store job error messages
#SBATCH --time=0-00:20:00 ### Wall clock time limit in Days-HH:MM:SS
#SBATCH --nodes=1 ### Number of nodes needed for the job
#SBATCH --mem=500M ### Total Memory for job in MB -- can do K/M/G/T for KB/MB/GB/TB
#SBATCH --ntasks-per-node=1 ### Number of tasks to be launched per Node
#SBATCH --cpus-per-task=1 ### Number of cpus/cores to be launched per Task
### Run your actual program
for i in {1..100}
do
echo "This is loop iteration $i"
sleep 10
done
This job features a Bash for-loop that will print one line of text and sleep for ten seconds 100 times, once for each iteration of the loop. This means it will run for at least 1000 seconds or for about 15 minutes.
Submit the job with sbatch.
sbatch long.sbatch
Submitted batch job 39456611
Inspect the job in the queue with squeue.
squeue --me
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
39456611 computelo long_hel emwin R 0:32 1 n0185
Let’s unpack this status:
- The job has the status R for running.
- It’s running on node n0185 on the computelong partition.
- The job is running as emwin.
- It has been running for 32 seconds.
To learn more about squeue results, see the Slurm documentation.
Canceling Running Batch Jobs
Let’s practice cancelling a slow job with scancel, which requires the job number of the job to be cancelled.
scancel 39456611
Looking at the queue again, it’s now empty, even though the job would have otherwise run for longer.
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
When you cancel a job, Slurm will write an error message to the error file for that file.
cat long_hello_world*.err
slurmstepd: error: *** JOB 39456611 ON n0185 CANCELLED AT 2025-10-23T13:32:09 ***
Optimization with seff
Want to know how much of the resources you requested a finished job used? Try the seff command.
seff 39456611
Job ID: 39456611
Cluster: talapas
User/Group: emwin/uoregon
State: CANCELLED (exit code 0)
Cores: 1
CPU Utilized: 00:00:00
CPU Efficiency: 0.00% of 00:01:47 core-walltime
Job Wall-clock time: 00:01:47
Memory Utilized: 460.00 KB
Memory Efficiency: 0.09% of 500.00 MB
Adjust your resource requests accordingly after your job runs. Be courteous and jobs won’t be deprioritized. Remember that your job’s runtime is its queue time + execution time, so asking for too many resources can make your job run slower because it will spend longer in the queue.
If your job runs out of memory, try doubling the amount you request. If you job uses less memory than you expected, you can use seff to lower the amount of memory requested and have your job schedule faster next time.
This job, which spends its CPU time “sleeping” or waiting, uses very little of its requested resources.
Another way to get resource usage is to pass in formatting arguments to the sacct command. We recommend inspecting MaxRSS (maximum memory usage) and ReqMem (memory requested) to determine memory effeciency of finished jobs.
Let’s use sacct with the following parameters.
sacct --units=G --format=JobId,jobname,MaxRSS,ReqMem,Start,Elapsed,State
JobID JobName MaxRSS ReqMem Start Elapsed State
------------ ---------- ---------- ---------- ------------------- ---------- ----------
39456543 hello_wor+ 0.49G 2025-10-23T13:12:44 00:00:00 FAILED
39456543.ba+ batch 0 2025-10-23T13:12:44 00:00:00 FAILED
39456543.ex+ extern 0 2025-10-23T13:12:44 00:00:00 COMPLETED
39456570 hello_wor+ 0.49G 2025-10-23T13:18:52 00:00:00 FAILED
39456570.ba+ batch 0 2025-10-23T13:18:52 00:00:00 FAILED
39456570.ex+ extern 0 2025-10-23T13:18:52 00:00:00 COMPLETED
39456578 hello_wor+ 0.49G 2025-10-23T13:23:53 00:00:01 COMPLETED
39456578.ba+ batch 0 2025-10-23T13:23:53 00:00:01 COMPLETED
39456578.ex+ extern 0 2025-10-23T13:23:53 00:00:01 COMPLETED
39456611 long_hell+ 0.49G 2025-10-23T13:30:22 00:01:47 CANCELLED+
39456611.ba+ batch 0.00G 2025-10-23T13:30:22 00:01:48 CANCELLED
39456611.ex+ extern 0.00G 2025-10-23T13:30:22 00:01:48 COMPLETED
To see a list of all possible sacct formatting parameters, use the sacct --helpformat command.
Introducing GPU Jobs: Debugging Activity
Let’s examine an example GPU job in detail: gpu.sbatch.
Only three partitions are compatible with GPUs:
- gpu
- gpulong
- interactivegpu
Each node in a GPU partition has at most 4 GPUS.
Examine gpu.sbatch.
cat gpu.sbatch
#!/bin/bash
#SBATCH --partition=compute ### Partition (like a queue in PBS)
#SBATCH --account=racs_training ### Account used for job submission
### NOTE: %u=userID, %x=jobName, %N=nodeID, %j=jobID, %A=arrayMain, %a=arraySub
#SBATCH --job-name=gpu_hello_world ### Job Name
#SBATCH --output=%x-%j.out ### File in which to store job output
#SBATCH --error=%x-%j.err ### File in which to store job error messages
#SBATCH --time=0-00:05:00 ### Wall clock time limit in Days-HH:MM:SS
#SBATCH --nodes=1 ### Number of nodes needed for the job
#SBATCH --mem=500M ### Total Memory for job in MB -- can do K/M/G/T for KB/MB/GB/TB
#SBATCH --gpus=1 ### Number of GPUs to request
#SBATCH --ntasks-per-node=1 ### Number of tasks to be launched per Node
#SBATCH --cpus-per-task=1 ### Number of cpus/cores to be launched per Task
#SBATCH --constraint=gpu-10gb
### SLURM can even email you when jobs reach certain states:
### #SBATCH --mail-type=BEGIN,END,FAIL ### accepted types are NONE,BEGIN,END,FAIL,REQUEUE,ALL (does all)
### #SBATCH --mail-user=<duckID>@uoregon.edu
### Load needed modules
module purge
module load cuda/12.4.1
module list
### Run your actual program
nvidia-smi
- You request at least one gpu with the
--gpusargument. It’s not enough to run on a GPU partition. - If you need to select certain nodes within a partition, the
--constraint=flag, can allow you to specify certain nodes based on job constraints. In this case--constraint=gpu-10gbrestricts the job to nodes with thegpu-10gbfeature. - That is, it gives a GPU with 10GB of VRAM.
- Every GPU requires a CPU too!
If you try to sbatch this job, Slurm will refuse to queue it. There’s a problem with the Slurm configuration!
sbatch gpu.sbatch
sbatch: error: Batch job submission failed: Requested node configuration is not availablesbatch: error: Batch job submission failed: Requested node configuration is not available
You cannot request --gpus=1 from a partition without nodes with GPUs. Fix the job by requesting one of the three partitions that have GPUs available: gpu, interactivegpu or gpulong.
nano gpu.sbatch
#!/bin/bash
#SBATCH --partition=gpu ### Partition (like a queue in PBS)
#SBATCH --account=racs_training ### Account used for job submission
### NOTE: %u=userID, %x=jobName, %N=nodeID, %j=jobID, %A=arrayMain, %a=arraySub
#SBATCH --job-name=gpu_hello_world ### Job Name
#SBATCH --output=%x-%j.out ### File in which to store job output
#SBATCH --error=%x-%j.err ### File in which to store job error messages
#SBATCH --time=0-00:05:00 ### Wall clock time limit in Days-HH:MM:SS
#SBATCH --nodes=1 ### Number of nodes needed for the job
#SBATCH --mem=500M ### Total Memory for job in MB -- can do K/M/G/T for KB/MB/GB/TB
#SBATCH --gpus=1 ### Number of GPUs to request
#SBATCH --ntasks-per-node=1 ### Number of tasks to be launched per Node
#SBATCH --cpus-per-task=1 ### Number of cpus/cores to be launched per Task
#SBATCH --constraint=gpu-10gb
### SLURM can even email you when jobs reach certain states:
### #SBATCH --mail-type=BEGIN,END,FAIL ### accepted types are NONE,BEGIN,END,FAIL,REQUEUE,ALL (does all)
### #SBATCH --mail-user=<duckID>@uoregon.edu
### Load needed modules
module purge
module load cuda/12.4.1
module list
### Run your actual program
nvidia-smi
Emailing Job Status
Running a really long job? Tired of checking squeue?
SLURM can even email you when jobs reach certain states:
### #SBATCH --mail-type=BEGIN,END,FAIL
### #SBATCH --mail-user=<duckID>@uoregon.edu
Edit these lines in your sbatch script with your DuckID. You should get an email from Slurm when the job begins and finishes (whether in success or failure.)
Schedule the GPU job and wait for results in your email. It should complete successfully.
sbatch gpu.sbatch
Submitted batch job 39459903
Once you receive an email confirmation, check the status with sacct.
sacct
39459903 gpu_hello+ gpu racs_trai+ 1 COMPLETED 0:0
39459903.ba+ batch racs_trai+ 1 COMPLETED 0:0
39459903.ex+ extern racs_trai+ 1 COMPLETED 0:0
The nvidia-smi command gives you the status of GPUs on your system, if any. It will error out if there are no GPUs on the system. Use head to peek at the message.
head gpu_hello_world*.out
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.163.01 Driver Version: 550.163.01 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA A100 80GB PCIe On | 00000000:E8:00.0 Off | On |
| N/A 37C P0 72W / 300W | 706MiB / 81920MiB | N/A Default |
Serial Processing: One Core and One Task at A Time
Navigate inside the part2 folder of the slurm folder using cd.
cd ../part2
Check that you see the following files inside with ls.
ls -F
array.sbatch* books.sbatch* parallel_steps.sbatch* serial_steps.sbatch*
books/ hello.sbatch* random.py* steps.sh*
Making Job Steps with Srun
Examine hello.sbatch with cat.
#SBATCH --partition=compute ### Partition (like a queue in PBS)
#SBATCH --account=racs_training ### Account used for job submission
### NOTE: %u=userID, %x=jobName, %N=nodeID, %j=jobID, %A=arrayMain, %a=arraySub
#SBATCH --job-name=hello ### Job Name
#SBATCH --output=%x.out ### File in which to store job output
#SBATCH --error=%x.err ### File in which to store job error messages
#SBATCH --time=0-00:05:00 ### Wall clock time limit in Days-HH:MM:SS
#SBATCH --nodes=1 ### Number of nodes needed for the job
#SBATCH --mem=50M ### Total Memory for job in MB -- can do K/M/G/T for KB/MB/GB/TB
#SBATCH --ntasks-per-node=1 ### Number of tasks to be launched per Node
#SBATCH --cpus-per-task=1 ### Number of cpus/cores to be launched per Task
srun echo "Hello From a SLURM Job Step!"
Submit this job with sbatch.
sbatch hello.sbatch
Submitted batch job 39459774
This job creates a single Slurm job step within the parent sbatch job. All components of a job step will have the same step ID value.
Check the job status with sacct.
sacct
39459774 hello compute racs_trai+ 1 COMPLETED 0:0
39459774.ba+ batch racs_trai+ 1 COMPLETED 0:0
39459774.ex+ extern racs_trai+ 1 COMPLETED 0:0
39459774.0 echo racs_trai+ 1 COMPLETED 0:0
Observe the fourth line! The echo command get its own job step 39459774.0, a child of the parent hello job.
An Example Serial Job
Examine serial_steps.sbatch job with `cat.
cat serial_steps.sbatch
#!/bin/bash
#SBATCH --partition=compute ### Partition (like a queue in PBS)
#SBATCH --account=racs_training ### Account used for job submission
### NOTE: %u=userID, %x=jobName, %N=nodeID, %j=jobID, %A=arrayMain, %a=arraySub
#SBATCH --job-name=serial_steps ### Job Name
#SBATCH --output=%x.out ### File in which to store job output
#SBATCH --error=%x.err ### File in which to store job error messages
#SBATCH --time=0-00:25:00 ### Wall clock time limit in Days-HH:MM:SS
#SBATCH --ntasks=1 ### Number of tasks included in the job
#SBATCH --mem-per-cpu=50M ### Total Memory for job in MB -- can do K/M/G/T for KB/MB/GB/TB
#SBATCH --ntasks-per-node=1 ### Number of tasks to be launched per Node
#SBATCH --cpus-per-task=1 ### Number of cpus/cores to be launched per Task
for i in {1..10}; do
srun steps.sh $i
done
Examine the step.sh script called by the batch script.
cat step.sh
#!/usr/bin/bash
echo "I am printing this from job step $1" & sleep 10
This job runs a Bash for-loop. To learn more about Bash loops, check out this quick tutorial.
srun creates a step within a job. A job with multiple srun commands will create a separate step (with the same jobid) in the sacct for each step.
This script uses srun to create 10 sequential (serial) steps in one job. By default, each iteration (here i = 1, 2, 3 ... 10) must finish before the loop moves to the next.
sbatch serial_steps.sbatch
Submitted batch job 39456708
Check the queue for the status of your job.
squeue --me
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
39456708 compute serial_s emwin R 0:07 1 n0185
As you can observe, the serial job is running and corresponds to exactly one row in the Slurm queue.
However, each step of serial steps (each subjob) is treated as a separate row in the sacct command.
sacct
39456708.0 steps.sh 0 2025-10-23T14:09:01 00:00:10
39456708.1 steps.sh 0 2025-10-23T14:09:11 00:00:10
39456708.2 steps.sh 0.00G 2025-10-23T14:09:21 00:00:10
39456708.3 steps.sh 0 2025-10-23T14:09:31 00:00:10
39456708.4 steps.sh 0 2025-10-23T14:09:41 00:00:10
39456708.5 steps.sh 2025-10-23T14:09:51 00:00:04
This job is on loop 5 of the serial steps. 39456708.6 can’t begin until 39456708.5 finishes. Serial jobs require that steps be completed one at a time and in a specific order.
Let’s look at the time spent on each stage in more detail using a specially configured sacct command. You can learn more about sacct formatting on the Slurm documentation.
sacct --units=G --format=JobID,jobname,MaxRSS,Start,Elapsed,State,ExitCode
39456708 serial_st+ 2025-10-23T14:09:01 00:01:41 COMPLETED 0:0
39456708.ba+ batch 0.00G 2025-10-23T14:09:01 00:01:41 COMPLETED 0:0
39456708.ex+ extern 0.00G 2025-10-23T14:09:01 00:01:41 COMPLETED 0:0
39456708.0 steps.sh 0 2025-10-23T14:09:01 00:00:10 COMPLETED 0:0
39456708.1 steps.sh 0 2025-10-23T14:09:11 00:00:10 COMPLETED 0:0
39456708.2 steps.sh 0.00G 2025-10-23T14:09:21 00:00:10 COMPLETED 0:0
39456708.3 steps.sh 0 2025-10-23T14:09:31 00:00:10 COMPLETED 0:0
39456708.4 steps.sh 0 2025-10-23T14:09:41 00:00:10 COMPLETED 0:0
39456708.5 steps.sh 0 2025-10-23T14:09:51 00:00:11 COMPLETED 0:0
39456708.6 steps.sh 0 2025-10-23T14:10:02 00:00:10 COMPLETED 0:0
39456708.7 steps.sh 0.00G 2025-10-23T14:10:12 00:00:10 COMPLETED 0:0
39456708.8 steps.sh 0 2025-10-23T14:10:22 00:00:10 COMPLETED 0:0
39456708.9 steps.sh 0 2025-10-23T14:10:32 00:00:10 COMPLETED 0:0
As you can see, all ten steps (numbered 0-9) are now marked as COMPLETED. Each subjob has a different start time. The subjobs start roughly ten seconds apart, after the previous subjob has finished.
Now that the job has completed, inspect the output logs.
cat serial_steps.out
I am printing this from job step 1
I am printing this from job step 2
I am printing this from job step 3
I am printing this from job step 4
I am printing this from job step 5
I am printing this from job step 6
I am printing this from job step 7
I am printing this from job step 8
I am printing this from job step 9
I am printing this from job step 10
Each subjob completes in a linear, serial fashion.
There are some jobs where certain steps or subcomponents must be completed in a serial fashion. However, serial jobs do not take full advantage of the computational power of Talapas or its support for parallelism. Think of serial computation as a last resort when using Talapas.
How can we make this job run in parallel? Will it be faster?
Parallelism with sbatch
Let’s examine parallel-steps.sbatch with cat.
cat parallel-steps.sbatch
It looks very similar to serial-steps.sbatch but there are a few subtle and important differences.
#!/bin/bash
#SBATCH --partition=compute ### Partition (like a queue in PBS)
#SBATCH --account=racs_training ### Account used for job submission
### NOTE: %u=userID, %x=jobName, %N=nodeID, %j=jobID, %A=arrayMain, %a=arraySub
#SBATCH --job-name=parallel_steps ### Job Name
#SBATCH --output=%x.out ### File in which to store job output
#SBATCH --error=%x.err ### File in which to store job error messages
#SBATCH --time=0-00:25:00 ### Wall clock time limit in Days-HH:MM:SS
#SBATCH --ntasks=10 ### Number of tasks included in the job
#SBATCH --cpus-per-task=1 ### Number of cpus/cores to be launched per Task
#SBATCH --mem-per-cpu=50M ### Number of cpus/cores to be launched per Task
for i in {1..10}; do
srun --ntasks=1 steps.sh $i &
done
wait
This job requests 10 CPUS, allocates 1 CPU per task, and then launches 10 concurrent srun subjobs.
The & at the end of srun --ntasks=1 test.sh $i & is crucial! The ampersand tells Bash to run this command in the background. Without the ampersand this loop would wait for one iteration to finish before moving to the next one.
Let’s launch this command with sbatch.
sbatch parallel_steps.sbatch
Submitted batch job 39456813
This job will finish very quickly.
If you look at the output log, you’ll notice something important.
The job steps don’t complete in order!
cat parallel_steps.out
I am printing this from job step 7
I am printing this from job step 5
I am printing this from job step 3
I am printing this from job step 9
I am printing this from job step 10
I am printing this from job step 2
I am printing this from job step 6
I am printing this from job step 8
I am printing this from job step 1
I am printing this from job step 4
The steps in the parallel job did not run in order or finish in order. You lose that guarantee in a parallel context.
How much faster did the parallel job run?
The serial job ran in 70 seconds. Use sacct to see how much faster the parallel job ran.
sacct --units=G --format=JobID,jobname,MaxRSS,Start,Elapsed,State,ExitCode
JobID JobName MaxRSS Start Elapsed State ExitCode
39456813 parallel_+ 2025-10-23T14:14:32 00:00:10 COMPLETED 0:0
39456813.ba+ batch 0 2025-10-23T14:14:32 00:00:10 COMPLETED 0:0
39456813.ex+ extern 0 2025-10-23T14:14:32 00:00:10 COMPLETED 0:0
39456813.0 steps.sh 0 2025-10-23T14:14:32 00:00:10 COMPLETED 0:0
39456813.1 steps.sh 0 2025-10-23T14:14:32 00:00:10 COMPLETED 0:0
39456813.2 steps.sh 0 2025-10-23T14:14:32 00:00:10 COMPLETED 0:0
39456813.3 steps.sh 0 2025-10-23T14:14:32 00:00:10 COMPLETED 0:0
39456813.4 steps.sh 0.00G 2025-10-23T14:14:32 00:00:10 COMPLETED 0:0
39456813.5 steps.sh 0.00G 2025-10-23T14:14:32 00:00:10 COMPLETED 0:0
39456813.6 steps.sh 0 2025-10-23T14:14:32 00:00:10 COMPLETED 0:0
39456813.7 steps.sh 0 2025-10-23T14:14:32 00:00:10 COMPLETED 0:0
39456813.8 steps.sh 0 2025-10-23T14:14:32 00:00:10 COMPLETED 0:0
39456813.9 steps.sh 0 2025-10-23T14:14:32 00:00:10 COMPLETED 0:0
The parallel job ran in 11 seconds, significantly faster than the 70 second serial version of the job. This is an example of an embarassingly parallel job, because each subjob is completely independent of the other subjobs.
Before starting our next activity, clear out old logs with rm.
rm *.err *.out
Array Jobs
Slurm has a special option for bulk scheduling of nearly identical jobs: array jobs. This approach is less flexible than srun, but it can be faster to configure.
Take a close look at array.sbatch with cat.
cat array.sbatch
#!/bin/bash
#SBATCH --partition=compute ### Partition (like a queue in PBS)
#SBATCH --account=racs_training ### Account used for job submission
### NOTE: %u=userID, %x=jobName, %N=nodeID, %j=jobID, %A=arrayMain, %a=arraySub
#SBATCH --job-name=python_array ### Job Name
#SBATCH --output=%x-%A-%a.out ### File in which to store job output
#SBATCH --error=%x-%A-%a.err ### File in which to store job error messages
#SBATCH --time=0-00:05:00 ### Wall clock time limit in Days-HH:MM:SS
#SBATCH --nodes=1 ### Number of nodes needed for the job
#SBATCH --mem=50M ### Total Memory for job in MB -- can do K/M/G/T for KB/MB/GB/TB
#SBATCH --ntasks-per-node=1 ### Number of tasks to be launched per Node
#SBATCH --cpus-per-task=1 ### Number of cpus/cores to be launched per Task
#SBATCH --array=1-10
### Load needed modules
module purge
module load python3/3.11.4
### Run your actual program
srun python3 random_array.py $SLURM_ARRAY_TASK_ID $SLURM_ARRAY_JOB_ID $SLURM_JOB_ID
Array jobs require the #SBATCH --array= parameter. In this example, the array parameter is the list {0, 1, 2, ... 10}. The %a stands for the id of the task and the %A stands for the main array.
This will create one main job and a subjob for each index in the array. Each array index will inherit the requested parameters specified in cpus-per-task and ntasks-per-node. This means each subjob asks for 1 CPU core.
This job runs random_array.py and passes in three arguments: the $SLURM_ARRAY_TASK_ID, the $SLURM_ARRAY_JOB_ID and $SLURM_JOB_ID. $SLURM_ARRAY_TASK_ID $SLURM_ARRAY_JOB_ID $SLURM_JOB_ID are environment variables.
The advantage of array jobs is that Slurm does not wait for all requested cores to be available for the first subjobs of the array jobs to be scheduled. This is not the case for parallel jobs that utilize srun to distribute subjobs among multiple requested cores.
Let’s look in more detail at random_array.py.
cat random_array.py
#!/usr/bin/env python3
import random
import sys
if __name__ == "__main__":
args = sys.argv[1:]
seed = args[0] # or Array Task ID
array_job_id = args[1]
job_id = args[2]
print(f"Array Job ID: {array_job_id} Array Task ID: {seed} Job ID: {job_id}")
print(f"SEED: {seed}")
random.seed(seed)
for i in range(0, 3):
print(random.random())
This Python script takes three arguments (array_job_id, array_task_id, job_id) and and then prints three random numbers This script will run once for each subjob in the array: 10 times.
Go ahead and submit the array job.
sbatch array.sbatch
Submitted batch job 39459879
Array jobs are parallel jobs.
You have no guarantee that all the array jobs will schedule or run at the same time.
Let’s check progress the job’s progress with sacct.
sacct
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
39459879_1 python_ar+ compute racs_trai+ 1 COMPLETED 0:0
39459879_1.+ batch racs_trai+ 1 COMPLETED 0:0
39459879_1.+ extern racs_trai+ 1 COMPLETED 0:0
39459879_1.0 python3 racs_trai+ 1 COMPLETED 0:0
39459879_2 python_ar+ compute racs_trai+ 1 COMPLETED 0:0
39459879_2.+ batch racs_trai+ 1 COMPLETED 0:0
39459879_2.+ extern racs_trai+ 1 COMPLETED 0:0
39459879_2.0 python3 racs_trai+ 1 COMPLETED 0:0
39459879_3 python_ar+ compute racs_trai+ 1 COMPLETED 0:0
39459879_3.+ batch racs_trai+ 1 COMPLETED 0:0
39459879_3.+ extern racs_trai+ 1 COMPLETED 0:0
39459879_3.0 python3 racs_trai+ 1 COMPLETED 0:0
39459879_4 python_ar+ compute racs_trai+ 1 COMPLETED 0:0
39459879_4.+ batch racs_trai+ 1 COMPLETED 0:0
39459879_4.+ extern racs_trai+ 1 COMPLETED 0:0
39459879_4.0 python3 racs_trai+ 1 COMPLETED 0:0
39459879_5 python_ar+ compute racs_trai+ 1 COMPLETED 0:0
39459879_5.+ batch racs_trai+ 1 COMPLETED 0:0
39459879_5.+ extern racs_trai+ 1 COMPLETED 0:0
39459879_5.0 python3 racs_trai+ 1 COMPLETED 0:0
39459879_6 python_ar+ compute racs_trai+ 1 COMPLETED 0:0
39459879_6.+ batch racs_trai+ 1 COMPLETED 0:0
39459879_6.+ extern racs_trai+ 1 COMPLETED 0:0
39459879_6.0 python3 racs_trai+ 1 COMPLETED 0:0
39459879_7 python_ar+ compute racs_trai+ 1 COMPLETED 0:0
39459879_7.+ batch racs_trai+ 1 COMPLETED 0:0
39459879_7.+ extern racs_trai+ 1 COMPLETED 0:0
39459879_7.0 python3 racs_trai+ 1 COMPLETED 0:0
39459879_8 python_ar+ compute racs_trai+ 1 COMPLETED 0:0
39459879_8.+ batch racs_trai+ 1 COMPLETED 0:0
39459879_8.+ extern racs_trai+ 1 COMPLETED 0:0
39459879_8.0 python3 racs_trai+ 1 COMPLETED 0:0
39459879_9 python_ar+ compute racs_trai+ 1 COMPLETED 0:0
39459879_9.+ batch racs_trai+ 1 COMPLETED 0:0
39459879_9.+ extern racs_trai+ 1 COMPLETED 0:0
39459879_9.0 python3 racs_trai+ 1 COMPLETED 0:0
39459879_10 python_ar+ compute racs_trai+ 1 COMPLETED 0:0
39459879_10+ batch racs_trai+ 1 COMPLETED 0:0
39459879_10+ extern racs_trai+ 1 COMPLETED 0:0
39459879_10+ python3 racs_trai+ 1 COMPLETED 0:0
Each subjob in an array job creates in own log. Use a wildcard and ls to list them.
ls python_array*
python_array-34704494-10.err python_array-34704494-2.out python_array-34704494-5.err python_array-34704494-7.out
python_array-34704494-10.out python_array-34704494-3.err python_array-34704494-5.out python_array-34704494-8.err
python_array-34704494-1.err python_array-34704494-3.out python_array-34704494-6.err python_array-34704494-8.out
python_array-34704494-1.out python_array-34704494-4.err python_array-34704494-6.out python_array-34704494-9.err
python_array-34704494-2.err python_array-34704494-4.out python_array-34704494-7.err python_array-34704494-9.out
To look at the last five lines of each of the output logs logs, you can use the tail command.
tail py*.out
==> python_array-39459879-10.out <==
FILE
/packages/miniconda-t2/20230523/envs/python-3.11.4/lib/python3.11/random.py
Array Job ID: 39459879 Array Task ID: 10 Job ID: 39459879
SEED: 10
0.8038117609963674
0.6395838575221652
0.10813824794140992
==> python_array-39459879-1.out <==
FILE
/packages/miniconda-t2/20230523/envs/python-3.11.4/lib/python3.11/random.py
Array Job ID: 39459879 Array Task ID: 1 Job ID: 39459880
SEED: 1
0.4782479962566343
0.044242767098090496
0.11703586901195051
==> python_array-39459879-2.out <==
FILE
/packages/miniconda-t2/20230523/envs/python-3.11.4/lib/python3.11/random.py
Array Job ID: 39459879 Array Task ID: 2 Job ID: 39459881
SEED: 2
0.5558450440322502
0.637051760769432
0.32591821829634604
==> python_array-39459879-3.out <==
FILE
/packages/miniconda-t2/20230523/envs/python-3.11.4/lib/python3.11/random.py
Array Job ID: 39459879 Array Task ID: 3 Job ID: 39459882
SEED: 3
0.8677521680526462
0.5835116078800329
0.6899116355769067
==> python_array-39459879-4.out <==
FILE
/packages/miniconda-t2/20230523/envs/python-3.11.4/lib/python3.11/random.py
Array Job ID: 39459879 Array Task ID: 4 Job ID: 39459883
SEED: 4
0.16000523129035404
0.3018180364144196
0.6137173239349033
==> python_array-39459879-5.out <==
FILE
/packages/miniconda-t2/20230523/envs/python-3.11.4/lib/python3.11/random.py
Array Job ID: 39459879 Array Task ID: 5 Job ID: 39459884
SEED: 5
0.35258858106537283
0.7805236423847258
0.5172954265015355
==> python_array-39459879-6.out <==
FILE
/packages/miniconda-t2/20230523/envs/python-3.11.4/lib/python3.11/random.py
Array Job ID: 39459879 Array Task ID: 6 Job ID: 39459885
SEED: 6
0.3898339501015142
0.9554066922708754
0.3550297799037352
==> python_array-39459879-7.out <==
FILE
/packages/miniconda-t2/20230523/envs/python-3.11.4/lib/python3.11/random.py
Array Job ID: 39459879 Array Task ID: 7 Job ID: 39459886
SEED: 7
0.7124376338974682
0.8779104020178983
0.9507485116206883
==> python_array-39459879-8.out <==
FILE
/packages/miniconda-t2/20230523/envs/python-3.11.4/lib/python3.11/random.py
Array Job ID: 39459879 Array Task ID: 8 Job ID: 39459887
SEED: 8
0.22991307664292304
0.7963928625032808
0.7965374772142675
==> python_array-39459879-9.out <==
FILE
/packages/miniconda-t2/20230523/envs/python-3.11.4/lib/python3.11/random.py
Array Job ID: 39459879 Array Task ID: 9 Job ID: 39459888
SEED: 9
0.1666321090310302
0.02959962356249568
0.629304132385857
Notice that each job has its own array task id, its own job id, and a shared array job id.
When to Use Array Jobs
If you can format your job as an array job, do so! Array jobs are the preferred way to parallelize your embarassingly parallel jobs. Array jobs are easy for the scheduler to manage.
Parallelism vs. Serial Execution
Array jobs do not “talk to each other”. They do not run in a guaranteed order, nor do they finish in a guaranteed order.
Regardless of how your job is configured, no two jobs or subjobs should not be used to write to or read from the same inputs concurrently. This create race conditions in which your code and filesystem can behave unpredictably.
*Race conditions are why we make sure everyone in the class copies example files from shared directories to their home directory before modifying them and executing them.
Example Array Job: Books
In the books folder, you should have the text of five books. Let’s say you want to get the word count of each book.
ls books
alice_in_wonderland.txt moby_dick.txt romeo_and_juliet.txt
complete_works_shakespeare.txt pride_and_prejudice.txt
Let’s examine books.sbatch, which computes the word counts of each book using the wc command.
cat books.sbatch
#!/bin/bash
#SBATCH --partition=compute ### Partition (like a queue in PBS)
#SBATCH --account=racs_training ### Account used for job submission
### NOTE: %u=userID, %x=jobName, %N=nodeID, %j=jobID, %A=arrayMain, %a=arraySub
#SBATCH --job-name=books_wc_array ### Job Name
#SBATCH --output=logs/%x-%A-%a.out ### File in which to store job output
#SBATCH --error=logs/%x-%A-%a.err ### File in which to store job error messages
#SBATCH --time=0-00:05:00 ### Wall clock time limit in Days-HH:MM:SS
#SBATCH --nodes=1 ### Number of nodes needed for the job
#SBATCH --mem=50M ### Total Memory for job in MB -- can do K/M/G/T for KB/MB/GB/TB
#SBATCH --ntasks-per-node=1 ### Number of tasks to be launched per Node
#SBATCH --cpus-per-task=1 ### Number of cpus/cores to be launched per Task
#SBATCH --array=0-4
BOOKS=(books/*)
srun wc -c ${BOOKS[$SLURM_ARRAY_TASK_ID]}
Submit the job with sbatch.
sbatch books.sbatch
Submitted batch job 39457114
Let’s check the job status.
sacct
39457114_0 books_wc_+ compute racs_trai+ 1 COMPLETED 0:0
39457114_0.+ batch racs_trai+ 1 COMPLETED 0:0
39457114_0.+ extern racs_trai+ 1 COMPLETED 0:0
39457114_0.0 wc racs_trai+ 1 COMPLETED 0:0
39457114_1 books_wc_+ compute racs_trai+ 1 COMPLETED 0:0
39457114_1.+ batch racs_trai+ 1 COMPLETED 0:0
39457114_1.+ extern racs_trai+ 1 COMPLETED 0:0
39457114_1.0 wc racs_trai+ 1 COMPLETED 0:0
39457114_2 books_wc_+ compute racs_trai+ 1 COMPLETED 0:0
39457114_2.+ batch racs_trai+ 1 COMPLETED 0:0
39457114_2.+ extern racs_trai+ 1 COMPLETED 0:0
39457114_2.0 wc racs_trai+ 1 COMPLETED 0:0
39457114_3 books_wc_+ compute racs_trai+ 1 COMPLETED 0:0
39457114_3.+ batch racs_trai+ 1 COMPLETED 0:0
39457114_3.+ extern racs_trai+ 1 COMPLETED 0:0
39457114_3.0 wc racs_trai+ 1 COMPLETED 0:0
39457114_4 books_wc_+ compute racs_trai+ 1 COMPLETED 0:0
39457114_4.+ batch racs_trai+ 1 COMPLETED 0:0
39457114_4.+ extern racs_trai+ 1 COMPLETED 0:0
39457114_4.0 wc racs_trai+ 1 COMPLETED 0:0
These following lines of sbatch configuration direct the output logs to the logs folder. The %x-%A-%a notation creates log files named (job name)-(array parent job)-(array index).out.
#SBATCH --output=logs/%x-%A-%a.out ### File in which to store job output
#SBATCH --error=logs/%x-%A-%a.err ### File in which to store job error messages
Inspect the logs folder with ls. You should see output and error logs for each book.
ls logs
books_wc_array-34763686-0.err
books_wc_array-34763686-0.out
books_wc_array-34763686-1.err
books_wc_array-34763686-1.out
books_wc_array-34763686-2.err
books_wc_array-34763686-2.out
books_wc_array-34763686-3.err
books_wc_array-34763686-3.out
books_wc_array-34763686-4.err
books_wc_array-34763686-4.out
Let’s use tail and a * wildcard to check the last few lines of each of the output logs.
tail logs/books*.out
As expected, the output logs contain the results of the wc -c followed by the book that was listed as input.
==> logs/books_wc_array-34763686-0.out <==
174357 books/alice_in_wonderland.txt
==> logs/books_wc_array-34763686-1.out <==
5638516 books/complete_works_shakespeare.txt
==> logs/books_wc_array-34763686-2.out <==
1276288 books/moby_dick.txt
==> logs/books_wc_array-34763686-3.out <==
772419 books/pride_and_prejudice.txt
==> logs/books_wc_array-34763686-4.out <==
169541 books/romeo_and_juliet.txt
Interactive Jobs: srun
Interactive jobs are great for development work if you need a better resourced machine for a couple hours or want to test a Slurm configuration on a small sample before deploying it to a batch job.
Let’s try an interactive job that requests the following resources:
- run on the
computepartition - requests 1 CPU
- requests 50MB of RAM
- runs for 10 minutes
- opens a
bashterminal
srun --partition=compute --account=racs_training
--cpus-per-task=1 --mem=50M --time=10 --pty bash
srun: job 34703737 queued and waiting for resources
srun: job 34703737 has been allocated resources
You are now in a Bash terminal, but this terminal is no longer on the compute node.
hostname
n0185.talapas.uoregon.edu
Interactive Jobs and Partitions
Interactive jobs inherit their default and maximum time limits from the limits set on the partition, just like batch jobs. This means that the interactive jobs launched on the compute, memory, and gpu can request a time limit of at most 24 hours.
Make sure to exit and cancel your job when you finish to free up resources,
exit
hostname
login1.talapas.uoregon.edu
You’ll noticed that the interactive job is marked as cancelled after using the exit command when checking the job id with the sacct command.
sacct -j 39457157
JobID JobName Partition Account AllocCPUS State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
39457157 bash compute racs_trai+ 1 COMPLETED 0:0
39457157.ex+ extern racs_trai+ 1 COMPLETED 0:0
39457157.0 bash racs_trai+ 1 COMPLETED 0:0
Interactive jobs are also a great place to try out configuring GPU jobs. Let’s be more polite and request only 30 minutes. By default, interactive jobs run with a maximum of 4GB of RAM, so your job will be killed in either 30 minutes or if you exceed our allotted RAM.
srun --partition=gpu --account=racs_training --time=30 --gpus=1 --constraint=gpu-10gb --pty bash
In this example constraint, you’re requesting a GPU with 10GB of VRAM.
srun: job 39459894 queued and waiting for resources
srun: job 39459894 has been allocated resources
Check that you’re on the interactive node using hostname.
hostname
n0162.talapas.uoregon.edu
Use the nvidia-smi command to get GPU information for the current node.
nvidia-smi
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.163.01 Driver Version: 550.163.01 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA A100 80GB PCIe On | 00000000:E8:00.0 Off | On |
| N/A 37C P0 72W / 300W | 829MiB / 81920MiB | N/A Default |
| | | Enabled |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| MIG devices: |
+------------------+----------------------------------+-----------+-----------------------+
| GPU GI CI MIG | Memory-Usage | Vol| Shared |
| ID ID Dev | BAR1-Usage | SM Unc| CE ENC DEC OFA JPG |
| | | ECC| |
|==================+==================================+===========+=======================|
| 0 12 0 0 | 13MiB / 9728MiB | 14 0 | 1 0 0 0 0 |
| | 0MiB / 16383MiB | | |
+------------------+----------------------------------+-----------+-----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
If you requested GPU resources correctly, you should be able to run nvidia-smi and see information about the GPU available to you. This output indicates the GPU resources were allocated correctly.
Exit your interactive job.
exit
Check that you’re back on a login node.
hostname
login2.talapas.uoregon.edu
This nvidia-smi command will fail in contexts where a GPU is unavailable or has not been allocated.
For example, if you run it on the login node, you’ll get a message like this.
nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running
It prints this failure message because nodes without GPUs don’t have a driver installed.
RACS has a guide on configuring interactive jobs on Talapas if you’d like to read more.
Quick Slurm Scheduling Tips
- Read the resource descriptions carefully: jobs that exceed their request time, memory, or processor usage will be automatically killed.
- Need to run for more than 24 hours? You must use
computelong,gpulong, ormemorylong. - Most jobs should have
--nodes=1if they don’t use MPI. - Remember to exit interactive jobs when you’re finished to free up resources for your colleagues.
Today’s Slurm Commands
| command | description | example usage |
|---|---|---|
| sbatch [jobfile] | queues a Slurm job | sbatch my_job.sh |
| sacct | lists (your) recent jobs and subjobs | sacct |
| squeue -u [user] | gets status of running or queued slurm jobs for user | squeue -u [yourDuckID] |
| scancel [jobid] | cancels job jobid | scancel [jobid] |
| srun | launches interactive shell session on a compute node | srun --partition=compute --account=racs_training --pty bash |
| sinfo | lists information on available partitions | sinfo |