Job Management
SLURM is installed as the cluster's workload manager.
Note: SSH access to compute nodes is allowed without reservation for testing,
compiling, and running small calculations. However, long-running jobs (more than 15 minutes) on
compute nodes will be automatically terminated by system daemons.
This restriction does not apply to the kech
and lia
partitions.
Check node availability with:
$ sinfo
Key states:
- idle: Available for jobs
- alloc: Fully allocated
- drain: Temporarily unavailable (maintenance)
- mix: Partially allocated
Check your jobs with:
$ squeue -u $USER
Key statuses:
- R: Running
- PD: Pending (queued)
- CG: Completing
Slurm Options
Example SLURM script (job.sh
):
#!/bin/bash
#SBATCH --job-name=test
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=16
#SBATCH --time=1:00:00
#SBATCH --partition=lecce2
module load python/3.13.2
python my_script.py
Submit with:
$ sbatch job.sh
Essential SLURM Options
Option | Description | Example |
---|---|---|
--nodes |
Number of nodes | --nodes=4 |
--ntasks-per-node |
Tasks (processes) per node | --ntasks-per-node=16 (for 16-core nodes) |
--cpus-per-task |
CPU cores allocated per process (for multi-threaded apps like OpenMP) | --cpus-per-task=4 (4 cores per task) |
--nodelist |
Specific nodes | --nodelist=cns01,cns02 |
--exclusive |
Dedicated node access | --exclusive |
--gres |
GPU resources | --gres=gpu:2 |
--mem |
Memory per node | --mem=32G |
--time |
Walltime limit | --time=24:00:00 |
Pro Tip: Use sbatch --test-only job.sh
to validate scripts without submission.
Interactive Sessions
Launch interactive jobs with salloc
:
$ salloc --nodes=1 --ntasks=1 --gres=gpu:1 --time=1:00:00