Job Management
SLURM is installed as the cluster's workload manager.
Check node availability with:
$ sinfo
Key states:
- idle: Available for jobs
- alloc: Fully allocated
- drain: Temporarily unavailable (maintenance)
- mix: Partially allocated
Check your jobs with:
$ squeue -u $USER
Key statuses:
- R: Running
- PD: Pending (queued)
- CG: Completing
Slurm Options
Example SLURM script (job.sh
):
#!/bin/bash
#SBATCH --job-name=test
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=16
#SBATCH --time=1:00:00
#SBATCH --partition=lecce2
module load python/3.13.2
python my_script.py
Submit with:
$ sbatch job.sh
Essential SLURM Options
Option | Description | Example |
---|---|---|
--nodes |
Number of nodes | --nodes=4 |
--ntasks-per-node |
Tasks (processes) per node | --ntasks-per-node=16 (for 16-core nodes) |
--cpus-per-task |
CPU cores allocated per process (for multi-threaded apps like OpenMP) | --cpus-per-task=4 (4 cores per task) |
--nodelist |
Specific nodes | --nodelist=cns01,cns02 |
--exclusive |
Dedicated node access | --exclusive |
--gres |
GPU resources | --gres=gpu:2 |
--mem |
Memory per node | --mem=32G |
--time |
Walltime limit | --time=24:00:00 |
Pro Tip: Use sbatch --test-only job.sh
to validate scripts without submission.
Interactive Sessions
Launch interactive jobs with salloc
:
$ salloc --nodes=1 --ntasks=1 --gres=gpu:1 --time=1:00:00