Slurm Implementation

Below is the complete revised Slurm implementation plan based on your current working data:

OpenStack/Kolla infrastructure
192.168.1.50 kolla-vip
192.168.1.51 ctrl
192.168.1.52 cmp
192.168.1.53 gpu

Existing GPU tenant VM
10.10.10.36 gpu-test-01

Slurm tenant VMs to create
10.10.10.30 slurm-controller
10.10.10.31 slurm-cpu1
10.10.10.32 slurm-cpu2

Your available OpenStack data now shows that the fixed Slurm ports already exist, the Ubuntu images exist, the current keypair is sont-key, and the original missing image problem has been fixed by uploading ubuntu-24.04. It also shows that your only original flavors were GPU-shaped, so you correctly asked for a CPU flavor next.


Revised target architecture

Proxmox layer
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
pve0 192.168.1.10
pve1 192.168.1.11
pve2 192.168.1.12


OpenStack/Kolla infrastructure layer
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
kolla-vip 192.168.1.50
ctrl 192.168.1.51
cmp 192.168.1.52
gpu 192.168.1.53


OpenStack tenant Slurm layer
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
slurm-controller 10.10.10.30
slurm-cpu1 10.10.10.31
slurm-cpu2 10.10.10.32
gpu-test-01 10.10.10.36


Slurm services
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
slurm-controller:
munge
slurmctld
slurmdbd
MariaDB

slurm-cpu1:
munge
slurmd

slurm-cpu2:
munge
slurmd

gpu-test-01:
munge
slurmd
NVIDIA driver
GRES gpu:gtx970:1

The important design decision remains:

Do not install Slurm on pve0/pve1/pve2.
Do not install Slurm on ctrl/cmp/gpu.
Run Slurm as OpenStack tenant VMs.
Use gpu-test-01 as the first Slurm GPU compute node.

Phase 0 โ€” Confirm OpenStack health first

On ctrl:

source /opt/kolla-venv/bin/activate
source /etc/kolla/admin-openrc.sh
export KOLLA_INVENTORY=/etc/kolla/multinode

Check core services:

openstack compute service list
openstack network agent list
openstack hypervisor list
openstack server list --all-projects
openstack network list
openstack subnet list
openstack security group list

If openstack security group list works, your previous Neutron/RabbitMQ problem is cleared.

Also confirm RabbitMQ hostname resolution is fixed:

getent hosts ctrl
docker exec rabbitmq getent hosts ctrl
docker ps --format 'table {{.Names}}\t{{.Status}}' | grep -Ei 'rabbit|neutron|haproxy|mariadb'

Expected:

192.168.1.51 ctrl
rabbitmq ... healthy
mariadb ... healthy

Phase 1 โ€” Confirm images, keypair, ports and flavors

You now have the image:

ubuntu-24.04
ubuntu-24.04-gpu

Confirm:

openstack image list

Confirm the keypair:

openstack keypair list

Your valid keypair is:

sont-key

Confirm the fixed Slurm ports:

openstack port list | grep slurm

You already have:

slurm-controller-port  10.10.10.30
slurm-cpu1-port 10.10.10.31
slurm-cpu2-port 10.10.10.32

Confirm each one:

openstack port show slurm-controller-port
openstack port show slurm-cpu1-port
openstack port show slurm-cpu2-port

The ports being DOWN is normal until VMs attach to them.


Phase 2 โ€” Create a CPU-only flavor

Your previous flavor list only had:

g1.gtx970
g1.gpu

For Slurm controller and CPU nodes, create a plain CPU flavor:

openstack flavor create m1.medium \
--ram 8192 \
--disk 40 \
--vcpus 4 \
--public

Verify:

openstack flavor list
openstack flavor show m1.medium

Also check your GPU flavors:

openstack flavor show g1.gpu -c name -c ram -c vcpus -c disk -c properties
openstack flavor show g1.gtx970 -c name -c ram -c vcpus -c disk -c properties

Use:

m1.medium   โ†’ slurm-controller, slurm-cpu1, slurm-cpu2
g1.gtx970 โ†’ future GPU Slurm VM only, if you later recreate gpu-test-01

For now, keep gpu-test-01 as the GPU Slurm node.


Phase 3 โ€” Prepare cloud-init for the Slurm CPU VMs

On ctrl, create or update:

vi slurm-cloud-init.yaml

Use this:

#cloud-config
package_update: true

packages:
- qemu-guest-agent
- python3
- python3-apt
- chrony
- munge
- slurm-wlm

write_files:
- path: /etc/cloud/cloud.cfg.d/999-disable-manage-etc-hosts.cfg
owner: root:root
permissions: '0644'
content: |
manage_etc_hosts: false

- path: /etc/hosts
owner: root:root
permissions: '0644'
content: |
127.0.0.1 localhost

10.10.10.30 slurm-controller
10.10.10.31 slurm-cpu1
10.10.10.32 slurm-cpu2
10.10.10.36 gpu-test-01 slurm-gpu1

::1 localhost ip6-localhost ip6-loopback
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

runcmd:
- sed -i '/^[[:space:]]*-[[:space:]]*update_etc_hosts[[:space:]]*$/d' /etc/cloud/cloud.cfg
- systemctl enable --now qemu-guest-agent
- systemctl enable --now chrony
- systemctl enable --now munge

This cloud-init deliberately prevents the same /etc/hosts regression you hit on the Kolla infrastructure VMs.


Phase 4 โ€” Create the Slurm tenant VMs

Create the controller:

openstack server create \
--image ubuntu-24.04 \
--flavor m1.medium \
--port slurm-controller-port \
--key-name sont-key \
--user-data slurm-cloud-init.yaml \
slurm-controller

Create CPU node 1:

openstack server create \
--image ubuntu-24.04 \
--flavor m1.medium \
--port slurm-cpu1-port \
--key-name sont-key \
--user-data slurm-cloud-init.yaml \
slurm-cpu1

Create CPU node 2:

openstack server create \
--image ubuntu-24.04 \
--flavor m1.medium \
--port slurm-cpu2-port \
--key-name sont-key \
--user-data slurm-cloud-init.yaml \
slurm-cpu2

Check status:

openstack server list
openstack server show slurm-controller -c status -c addresses -c OS-EXT-SRV-ATTR:host
openstack server show slurm-cpu1 -c status -c addresses -c OS-EXT-SRV-ATTR:host
openstack server show slurm-cpu2 -c status -c addresses -c OS-EXT-SRV-ATTR:host

Expected:

slurm-controller  ACTIVE  10.10.10.30
slurm-cpu1 ACTIVE 10.10.10.31
slurm-cpu2 ACTIVE 10.10.10.32

Phase 5 โ€” Access the Slurm VMs

Your tenant network ID from the port output is:

54829687-5a62-4d95-a7d0-42f3e30f7dbf

So the DHCP namespace should be:

qdhcp-54829687-5a62-4d95-a7d0-42f3e30f7dbf

From ctrl:

sudo ip netns exec qdhcp-54829687-5a62-4d95-a7d0-42f3e30f7dbf \
ssh -i /home/sont/.ssh/id_ed25519_kolla ubuntu@10.10.10.30

If the SSH key path differs from the OpenStack keypair material, use the private key corresponding to sont-key.

From slurm-controller, test:

ping -c 2 slurm-cpu1
ping -c 2 slurm-cpu2
ping -c 2 gpu-test-01

Also SSH between nodes:

ssh ubuntu@slurm-cpu1
ssh ubuntu@slurm-cpu2
ssh ubuntu@gpu-test-01

Phase 6 โ€” Prepare all four Slurm nodes

The four Slurm nodes are:

slurm-controller  10.10.10.30
slurm-cpu1 10.10.10.31
slurm-cpu2 10.10.10.32
gpu-test-01 10.10.10.36

On all four nodes:

sudo apt update
sudo apt install -y munge slurm-wlm chrony
sudo systemctl enable --now chrony

On slurm-controller only:

sudo apt install -y mariadb-server slurmdbd

On gpu-test-01, confirm GPU:

lspci | grep -i nvidia
nvidia-smi
ls -l /dev/nvidia*

Do not proceed with Slurm GPU/GRES until nvidia-smi works inside gpu-test-01.


Phase 7 โ€” Configure Munge

On slurm-controller:

sudo create-munge-key
sudo chown munge:munge /etc/munge/munge.key
sudo chmod 400 /etc/munge/munge.key
sudo systemctl enable --now munge

Copy the same key to all compute nodes:

scp /etc/munge/munge.key ubuntu@slurm-cpu1:/tmp/munge.key
scp /etc/munge/munge.key ubuntu@slurm-cpu2:/tmp/munge.key
scp /etc/munge/munge.key ubuntu@gpu-test-01:/tmp/munge.key

On each compute node:

sudo mv /tmp/munge.key /etc/munge/munge.key
sudo chown munge:munge /etc/munge/munge.key
sudo chmod 400 /etc/munge/munge.key
sudo systemctl enable --now munge

Test from each compute node:

munge -n | ssh slurm-controller unmunge

Expected:

STATUS: Success

Phase 8 โ€” Configure MariaDB and SlurmDBD

On slurm-controller:

sudo mysql

Inside MariaDB:

CREATE DATABASE slurm_acct_db;
CREATE USER 'slurm'@'localhost' IDENTIFIED BY 'ChangeThisStrongPassword';
GRANT ALL PRIVILEGES ON slurm_acct_db.* TO 'slurm'@'localhost';
FLUSH PRIVILEGES;
EXIT;

Create /etc/slurm/slurmdbd.conf:

sudo tee /etc/slurm/slurmdbd.conf > /dev/null <<'EOF'
AuthType=auth/munge
DbdHost=slurm-controller
DbdPort=6819
SlurmUser=slurm
DebugLevel=info

StorageType=accounting_storage/mysql
StorageHost=localhost
StoragePort=3306
StorageUser=slurm
StoragePass=ChangeThisStrongPassword
StorageLoc=slurm_acct_db

LogFile=/var/log/slurm/slurmdbd.log
PidFile=/run/slurmdbd.pid
EOF

Set permissions:

sudo chown slurm:slurm /etc/slurm/slurmdbd.conf
sudo chmod 600 /etc/slurm/slurmdbd.conf
sudo mkdir -p /var/log/slurm
sudo chown slurm:slurm /var/log/slurm

Start:

sudo systemctl enable --now slurmdbd
sudo systemctl status slurmdbd

Phase 9 โ€” Configure Slurm

Check actual CPU and memory on each node:

lscpu
free -m

Then on slurm-controller, create /etc/slurm/slurm.conf:

sudo tee /etc/slurm/slurm.conf > /dev/null <<'EOF'
ClusterName=openstack-slurm-lab
SlurmctldHost=slurm-controller

SlurmUser=slurm
AuthType=auth/munge
CryptoType=crypto/munge

StateSaveLocation=/var/spool/slurmctld
SlurmdSpoolDir=/var/spool/slurmd

SwitchType=switch/none
MpiDefault=none
ProctrackType=proctrack/cgroup
TaskPlugin=task/affinity,task/cgroup

SchedulerType=sched/backfill
SelectType=select/cons_tres
SelectTypeParameters=CR_Core_Memory

AccountingStorageType=accounting_storage/slurmdbd
AccountingStorageHost=slurm-controller
JobAcctGatherType=jobacct_gather/linux

GresTypes=gpu

SlurmctldPort=6817
SlurmdPort=6818

SlurmctldLogFile=/var/log/slurm/slurmctld.log
SlurmdLogFile=/var/log/slurm/slurmd.log
SlurmctldPidFile=/run/slurmctld.pid
SlurmdPidFile=/run/slurmd.pid

ReturnToService=2
InactiveLimit=0
KillWait=30
Waittime=0

NodeName=slurm-cpu1 CPUs=4 RealMemory=7900 State=UNKNOWN
NodeName=slurm-cpu2 CPUs=4 RealMemory=7900 State=UNKNOWN
NodeName=gpu-test-01 CPUs=4 RealMemory=7900 Gres=gpu:gtx970:1 State=UNKNOWN

PartitionName=cpu Nodes=slurm-cpu1,slurm-cpu2 Default=YES MaxTime=INFINITE State=UP
PartitionName=gpu Nodes=gpu-test-01 MaxTime=INFINITE State=UP
EOF

Create directories on controller:

sudo mkdir -p /var/spool/slurmctld /var/log/slurm
sudo chown slurm:slurm /var/spool/slurmctld /var/log/slurm

Copy slurm.conf to all compute nodes:

scp /etc/slurm/slurm.conf ubuntu@slurm-cpu1:/tmp/slurm.conf
scp /etc/slurm/slurm.conf ubuntu@slurm-cpu2:/tmp/slurm.conf
scp /etc/slurm/slurm.conf ubuntu@gpu-test-01:/tmp/slurm.conf

On each compute node:

sudo mv /tmp/slurm.conf /etc/slurm/slurm.conf
sudo mkdir -p /var/spool/slurmd /var/log/slurm
sudo chown slurm:slurm /var/spool/slurmd /var/log/slurm

Phase 10 โ€” Configure GPU GRES on gpu-test-01

On gpu-test-01:

sudo tee /etc/slurm/gres.conf > /dev/null <<'EOF'
Name=gpu Type=gtx970 File=/dev/nvidia0
EOF

Verify:

cat /etc/slurm/gres.conf
nvidia-smi
ls -l /dev/nvidia0

Phase 11 โ€” Start Slurm services

On slurm-controller:

sudo systemctl enable --now slurmctld
sudo systemctl status slurmctld

On slurm-cpu1, slurm-cpu2, and gpu-test-01:

sudo systemctl enable --now slurmd
sudo systemctl status slurmd

Back on slurm-controller:

sinfo
sinfo -Nel
sinfo -o "%20N %10T %10c %10m %20G"

Expected:

slurm-cpu1    idle   4 CPU   ~7900 MB   none
slurm-cpu2 idle 4 CPU ~7900 MB none
gpu-test-01 idle 4 CPU ~7900 MB gpu:gtx970:1

If nodes are down:

sudo scontrol update NodeName=slurm-cpu1 State=RESUME
sudo scontrol update NodeName=slurm-cpu2 State=RESUME
sudo scontrol update NodeName=gpu-test-01 State=RESUME

Phase 12 โ€” Run validation jobs

CPU job

cat > cpu-job.sh <<'EOF'
#!/bin/bash
#SBATCH --job-name=cpu-test
#SBATCH --partition=cpu
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --time=00:02:00
#SBATCH --output=cpu-test-%j.out

echo "Running on $(hostname)"
echo "Job ID: $SLURM_JOB_ID"
sleep 30
echo "Done"
EOF

sbatch cpu-job.sh
squeue

Check:

cat cpu-test-<jobid>.out
sacct -j <jobid> --format=JobID,JobName,Partition,AllocCPUS,State,ExitCode,Elapsed

Array job

cat > array-job.sh <<'EOF'
#!/bin/bash
#SBATCH --job-name=array-test
#SBATCH --partition=cpu
#SBATCH --array=1-10%3
#SBATCH --time=00:02:00
#SBATCH --output=array-%A_%a.out

echo "Array job ID: $SLURM_ARRAY_JOB_ID"
echo "Array task ID: $SLURM_ARRAY_TASK_ID"
echo "Running on $(hostname)"
sleep 20
EOF

sbatch array-job.sh
squeue

GPU job

cat > gpu-job.sh <<'EOF'
#!/bin/bash
#SBATCH --job-name=gpu-test
#SBATCH --partition=gpu
#SBATCH --nodes=1
#SBATCH --gres=gpu:1
#SBATCH --time=00:05:00
#SBATCH --output=gpu-test-%j.out

echo "Running on $(hostname)"
echo "CUDA_VISIBLE_DEVICES=$CUDA_VISIBLE_DEVICES"
nvidia-smi
EOF

sbatch gpu-job.sh
squeue

Check:

cat gpu-test-<jobid>.out
sacct -j <jobid> --format=JobID,JobName,Partition,AllocTRES,State,ExitCode

Phase 13 โ€” MPI learning

Install MPI on all Slurm nodes:

sudo apt install -y openmpi-bin libopenmpi-dev

Create MPI test on slurm-controller:

cat > mpi-hello.c <<'EOF'
#include <mpi.h>
#include <stdio.h>
#include <unistd.h>

int main(int argc, char** argv) {
MPI_Init(&argc, &argv);

int world_size;
int world_rank;
char hostname[256];

MPI_Comm_size(MPI_COMM_WORLD, &world_size);
MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
gethostname(hostname, sizeof(hostname));

printf("Hello from rank %d of %d on %s\n", world_rank, world_size, hostname);

MPI_Finalize();
return 0;
}
EOF

mpicc mpi-hello.c -o mpi-hello

Create job:

cat > mpi-job.sh <<'EOF'
#!/bin/bash
#SBATCH --job-name=mpi-test
#SBATCH --partition=cpu
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=2
#SBATCH --time=00:05:00
#SBATCH --output=mpi-test-%j.out

srun ./mpi-hello
EOF

sbatch mpi-job.sh

Phase 14 โ€” Slurm admin commands to practise

Cluster view:

sinfo
sinfo -Nel
scontrol show partition
scontrol show node slurm-cpu1
scontrol show node gpu-test-01

Queue view:

squeue
squeue -u $USER
scontrol show job <jobid>

Accounting:

sacct
sacct -j <jobid>
sacct -j <jobid> --format=JobID,JobName,Partition,AllocTRES,State,ExitCode,Elapsed

Node maintenance:

sudo scontrol update NodeName=slurm-cpu1 State=DRAIN Reason="maintenance test"
sinfo
sudo scontrol update NodeName=slurm-cpu1 State=RESUME

Cancel jobs:

scancel <jobid>
scancel -u $USER

Logs:

sudo journalctl -u slurmctld -f
sudo journalctl -u slurmd -f
sudo journalctl -u slurmdbd -f

Phase 15 โ€” Later automation path

Once the manual setup works, automate in this order:

1. Terraform creates ports and CPU VMs
2. Terraform creates m1.medium if missing
3. Ansible configures /etc/hosts, Munge, Slurm, SlurmDBD
4. Ansible joins gpu-test-01 as GPU node
5. Add Terraform-managed GPU node later
6. Add elastic Slurm workers later

Repository layout:

openstack-slurm-lab/
โ”œโ”€โ”€ terraform/
โ”‚ โ”œโ”€โ”€ main.tf
โ”‚ โ”œโ”€โ”€ variables.tf
โ”‚ โ”œโ”€โ”€ outputs.tf
โ”‚ โ””โ”€โ”€ cloud-init-slurm.yaml
โ”œโ”€โ”€ ansible/
โ”‚ โ”œโ”€โ”€ inventory.ini
โ”‚ โ”œโ”€โ”€ site.yml
โ”‚ โ”œโ”€โ”€ roles/
โ”‚ โ”‚ โ”œโ”€โ”€ common/
โ”‚ โ”‚ โ”œโ”€โ”€ slurm_controller/
โ”‚ โ”‚ โ”œโ”€โ”€ slurm_compute/
โ”‚ โ”‚ โ””โ”€โ”€ slurm_gpu/
โ””โ”€โ”€ jobs/
โ”œโ”€โ”€ cpu-job.sh
โ”œโ”€โ”€ array-job.sh
โ”œโ”€โ”€ mpi-job.sh
โ””โ”€โ”€ gpu-job.sh

Immediate next command sequence

Run this now:

source /opt/kolla-venv/bin/activate
source /etc/kolla/admin-openrc.sh

openstack flavor create m1.medium \
--ram 8192 \
--disk 40 \
--vcpus 4 \
--public

openstack server create \
--image ubuntu-24.04 \
--flavor m1.medium \
--port slurm-controller-port \
--key-name sont-key \
--user-data slurm-cloud-init.yaml \
slurm-controller

openstack server create \
--image ubuntu-24.04 \
--flavor m1.medium \
--port slurm-cpu1-port \
--key-name sont-key \
--user-data slurm-cloud-init.yaml \
slurm-cpu1

openstack server create \
--image ubuntu-24.04 \
--flavor m1.medium \
--port slurm-cpu2-port \
--key-name sont-key \
--user-data slurm-cloud-init.yaml \
slurm-cpu2

Then verify:

openstack server list
openstack port list | grep slurm

After the three VMs are ACTIVE, move on to Munge, SlurmDBD, slurm.conf, and GRES setup.

Munge, SlurmDBD, slurm.conf, and GRES setup

Below are the full manual commands for configuring:

Munge
SlurmDBD
MariaDB accounting
slurm.conf
GRES GPU scheduling
Slurm services
Validation jobs

This assumes your Slurm tenant VMs are:

slurm-controller  10.10.10.30
slurm-cpu1 10.10.10.31
slurm-cpu2 10.10.10.32
gpu-test-01 10.10.10.36

Your uploaded OpenStack state confirms the Slurm fixed ports and image/keypair situation: slurm-controller-port is 10.10.10.30, slurm-cpu1-port is 10.10.10.31, slurm-cpu2-port is 10.10.10.32, the valid keypair is sont-key, and ubuntu-24.04 now exists in Glance.


0. SSH into the Slurm controller

From ctrl, use the Neutron DHCP namespace:

sudo ip netns exec qdhcp-54829687-5a62-4d95-a7d0-42f3e30f7dbf \
ssh -i /home/sont/.ssh/id_ed25519_kolla ubuntu@10.10.10.30

Once inside:

hostname
ip a
cat /etc/hosts

Expected:

slurm-controller
10.10.10.30 slurm-controller
10.10.10.31 slurm-cpu1
10.10.10.32 slurm-cpu2
10.10.10.36 gpu-test-01 slurm-gpu1

Test connectivity:

ping -c 2 slurm-cpu1
ping -c 2 slurm-cpu2
ping -c 2 gpu-test-01

1. Install base packages on all Slurm nodes

Run this on all four nodes:

slurm-controller
slurm-cpu1
slurm-cpu2
gpu-test-01

Command:

sudo apt update
sudo apt install -y munge slurm-wlm chrony python3
sudo systemctl enable --now chrony

Check:

systemctl status chrony --no-pager
systemctl status munge --no-pager

On slurm-controller only, also install MariaDB and SlurmDBD:

sudo apt install -y mariadb-server slurmdbd

2. Fix /etc/hosts on all Slurm nodes

Run this on each of the four Slurm nodes:

sudo cp /etc/hosts /etc/hosts.bak.$(date +%F-%H%M)

sudo tee /etc/hosts > /dev/null <<'EOF'
127.0.0.1 localhost

10.10.10.30 slurm-controller
10.10.10.31 slurm-cpu1
10.10.10.32 slurm-cpu2
10.10.10.36 gpu-test-01 slurm-gpu1

::1 localhost ip6-localhost ip6-loopback
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
EOF

Verify on each node:

getent hosts slurm-controller
getent hosts slurm-cpu1
getent hosts slurm-cpu2
getent hosts gpu-test-01

Expected:

10.10.10.30 slurm-controller
10.10.10.31 slurm-cpu1
10.10.10.32 slurm-cpu2
10.10.10.36 gpu-test-01 slurm-gpu1

3. Configure Munge

3.1 On slurm-controller

Stop Munge first:

sudo systemctl stop munge

Create a fresh Munge key:

sudo create-munge-key -f

Set correct ownership and permissions:

sudo chown munge:munge /etc/munge/munge.key
sudo chmod 400 /etc/munge/munge.key
sudo chmod 700 /etc/munge

Start Munge:

sudo systemctl enable --now munge
sudo systemctl status munge --no-pager

Check local Munge:

munge -n | unmunge

Expected:

STATUS: Success

3.2 Copy Munge key to compute nodes

From slurm-controller:

scp /etc/munge/munge.key ubuntu@slurm-cpu1:/tmp/munge.key
scp /etc/munge/munge.key ubuntu@slurm-cpu2:/tmp/munge.key
scp /etc/munge/munge.key ubuntu@gpu-test-01:/tmp/munge.key

On each compute node, run:

sudo systemctl stop munge

sudo mv /tmp/munge.key /etc/munge/munge.key
sudo chown munge:munge /etc/munge/munge.key
sudo chmod 400 /etc/munge/munge.key
sudo chmod 700 /etc/munge

sudo systemctl enable --now munge
sudo systemctl status munge --no-pager

3.3 Test Munge from every node

From slurm-cpu1:

munge -n | ssh slurm-controller unmunge

From slurm-cpu2:

munge -n | ssh slurm-controller unmunge

From gpu-test-01:

munge -n | ssh slurm-controller unmunge

Expected each time:

STATUS: Success

Also test reverse direction from slurm-controller:

munge -n | ssh slurm-cpu1 unmunge
munge -n | ssh slurm-cpu2 unmunge
munge -n | ssh gpu-test-01 unmunge

If this fails, fix Munge before touching Slurm.


4. Configure MariaDB for Slurm accounting

Run this only on slurm-controller.

Start MariaDB:

sudo systemctl enable --now mariadb
sudo systemctl status mariadb --no-pager

Create Slurm accounting database:

sudo mysql

Inside the MariaDB shell:

CREATE DATABASE slurm_acct_db;
CREATE USER 'slurm'@'localhost' IDENTIFIED BY 'ChangeThisStrongPassword';
GRANT ALL PRIVILEGES ON slurm_acct_db.* TO 'slurm'@'localhost';
FLUSH PRIVILEGES;
EXIT;

Test the Slurm DB user:

mysql -u slurm -p slurm_acct_db

Enter:

ChangeThisStrongPassword

Then:

SHOW DATABASES;
EXIT;

5. Configure SlurmDBD

Run on slurm-controller.

Create log directory:

sudo mkdir -p /var/log/slurm
sudo chown slurm:slurm /var/log/slurm

Create /etc/slurm/slurmdbd.conf:

sudo tee /etc/slurm/slurmdbd.conf > /dev/null <<'EOF'
AuthType=auth/munge
DbdHost=slurm-controller
DbdPort=6819
SlurmUser=slurm
DebugLevel=info

StorageType=accounting_storage/mysql
StorageHost=localhost
StoragePort=3306
StorageUser=slurm
StoragePass=ChangeThisStrongPassword
StorageLoc=slurm_acct_db

LogFile=/var/log/slurm/slurmdbd.log
PidFile=/run/slurmdbd.pid
EOF

Set strict permissions:

sudo chown slurm:slurm /etc/slurm/slurmdbd.conf
sudo chmod 600 /etc/slurm/slurmdbd.conf

Start SlurmDBD:

sudo systemctl enable --now slurmdbd
sudo systemctl status slurmdbd --no-pager

Check logs:

sudo journalctl -u slurmdbd --no-pager -n 50
sudo tail -n 50 /var/log/slurm/slurmdbd.log

If it fails, the usual causes are:

wrong MariaDB password
slurmdbd.conf permissions not 600
Munge not running
MariaDB not running

6. Create accounting cluster/account/user

Run on slurm-controller.

First check whether sacctmgr works:

sacctmgr show cluster

Add the cluster:

sudo sacctmgr -i add cluster openstack-slurm-lab

Add an account:

sudo sacctmgr -i add account lab Description="Homelab Slurm Account" Organization="homelab"

Add your Ubuntu user:

sudo sacctmgr -i add user ubuntu Account=lab

Check:

sacctmgr show cluster
sacctmgr show account
sacctmgr show user

7. Check CPU and memory values

Before writing slurm.conf, check each node.

On slurm-cpu1:

lscpu | egrep 'CPU\(s\)|Thread|Core|Socket'
free -m

On slurm-cpu2:

lscpu | egrep 'CPU\(s\)|Thread|Core|Socket'
free -m

On gpu-test-01:

lscpu | egrep 'CPU\(s\)|Thread|Core|Socket'
free -m
nvidia-smi

For your m1.medium flavor, the expected resource shape is:

4 vCPU
8192 MB RAM
40 GB disk

In slurm.conf, use slightly less than actual memory. For 8 GB VMs, use:

RealMemory=7900

8. Configure /etc/slurm/slurm.conf

Run on slurm-controller.

Back up any existing file:

sudo cp /etc/slurm/slurm.conf /etc/slurm/slurm.conf.bak.$(date +%F-%H%M) 2>/dev/null || true

Create the config:

sudo tee /etc/slurm/slurm.conf > /dev/null <<'EOF'
ClusterName=openstack-slurm-lab
SlurmctldHost=slurm-controller

SlurmUser=slurm
AuthType=auth/munge
CryptoType=crypto/munge

StateSaveLocation=/var/spool/slurmctld
SlurmdSpoolDir=/var/spool/slurmd

SwitchType=switch/none
MpiDefault=none
ProctrackType=proctrack/cgroup
TaskPlugin=task/affinity,task/cgroup

SchedulerType=sched/backfill
SelectType=select/cons_tres
SelectTypeParameters=CR_Core_Memory

AccountingStorageType=accounting_storage/slurmdbd
AccountingStorageHost=slurm-controller
AccountingStoragePort=6819
JobAcctGatherType=jobacct_gather/linux

GresTypes=gpu

SlurmctldPort=6817
SlurmdPort=6818

SlurmctldLogFile=/var/log/slurm/slurmctld.log
SlurmdLogFile=/var/log/slurm/slurmd.log
SlurmctldPidFile=/run/slurmctld.pid
SlurmdPidFile=/run/slurmd.pid

ReturnToService=2
InactiveLimit=0
KillWait=30
Waittime=0

NodeName=slurm-cpu1 CPUs=4 RealMemory=7900 State=UNKNOWN
NodeName=slurm-cpu2 CPUs=4 RealMemory=7900 State=UNKNOWN
NodeName=gpu-test-01 CPUs=4 RealMemory=7900 Gres=gpu:gtx970:1 State=UNKNOWN

PartitionName=cpu Nodes=slurm-cpu1,slurm-cpu2 Default=YES MaxTime=INFINITE State=UP
PartitionName=gpu Nodes=gpu-test-01 MaxTime=INFINITE State=UP
EOF

Create controller directories:

sudo mkdir -p /var/spool/slurmctld
sudo mkdir -p /var/log/slurm

sudo chown slurm:slurm /var/spool/slurmctld
sudo chown slurm:slurm /var/log/slurm

Set permissions:

sudo chown slurm:slurm /etc/slurm/slurm.conf
sudo chmod 644 /etc/slurm/slurm.conf

Validate basic config syntax:

sudo slurmctld -t

Expected:

slurmctld: slurmctld version ...

No fatal errors.


9. Copy slurm.conf to all compute nodes

From slurm-controller:

scp /etc/slurm/slurm.conf ubuntu@slurm-cpu1:/tmp/slurm.conf
scp /etc/slurm/slurm.conf ubuntu@slurm-cpu2:/tmp/slurm.conf
scp /etc/slurm/slurm.conf ubuntu@gpu-test-01:/tmp/slurm.conf

On slurm-cpu1, slurm-cpu2, and gpu-test-01, run:

sudo mv /tmp/slurm.conf /etc/slurm/slurm.conf
sudo chown slurm:slurm /etc/slurm/slurm.conf
sudo chmod 644 /etc/slurm/slurm.conf

sudo mkdir -p /var/spool/slurmd
sudo mkdir -p /var/log/slurm

sudo chown slurm:slurm /var/spool/slurmd
sudo chown slurm:slurm /var/log/slurm

10. Configure GRES on GPU node

Run only on:

gpu-test-01

First verify GPU:

hostname
lspci | grep -i nvidia
nvidia-smi
ls -l /dev/nvidia*

Expected:

/dev/nvidia0 exists
nvidia-smi works

Create /etc/slurm/gres.conf:

sudo tee /etc/slurm/gres.conf > /dev/null <<'EOF'
Name=gpu Type=gtx970 File=/dev/nvidia0
EOF

Set permissions:

sudo chown slurm:slurm /etc/slurm/gres.conf
sudo chmod 644 /etc/slurm/gres.conf

Check:

cat /etc/slurm/gres.conf

Expected:

Name=gpu Type=gtx970 File=/dev/nvidia0

Important: do not create gres.conf on the CPU nodes.


11. Start Slurm services

11.1 On slurm-controller

Make sure Munge and SlurmDBD are up:

sudo systemctl status munge --no-pager
sudo systemctl status slurmdbd --no-pager

Start slurmctld:

sudo systemctl enable --now slurmctld
sudo systemctl status slurmctld --no-pager

Check logs:

sudo journalctl -u slurmctld --no-pager -n 100
sudo tail -n 100 /var/log/slurm/slurmctld.log

11.2 On slurm-cpu1

sudo systemctl enable --now slurmd
sudo systemctl status slurmd --no-pager
sudo journalctl -u slurmd --no-pager -n 50

11.3 On slurm-cpu2

sudo systemctl enable --now slurmd
sudo systemctl status slurmd --no-pager
sudo journalctl -u slurmd --no-pager -n 50

11.4 On gpu-test-01

sudo systemctl enable --now slurmd
sudo systemctl status slurmd --no-pager
sudo journalctl -u slurmd --no-pager -n 50

If gpu-test-01 fails, check GRES:

sudo slurmd -C
cat /etc/slurm/gres.conf
ls -l /dev/nvidia0
nvidia-smi

12. Validate Slurm cluster

Run on slurm-controller.

sinfo

Better view:

sinfo -Nel

Resource/GRES view:

sinfo -o "%20N %10T %10c %10m %20G"

Expected:

slurm-cpu1    idle   4   7900   (null)
slurm-cpu2 idle 4 7900 (null)
gpu-test-01 idle 4 7900 gpu:gtx970:1

Show nodes:

scontrol show node slurm-cpu1
scontrol show node slurm-cpu2
scontrol show node gpu-test-01

If nodes are DOWN or UNKNOWN, resume them:

sudo scontrol update NodeName=slurm-cpu1 State=RESUME
sudo scontrol update NodeName=slurm-cpu2 State=RESUME
sudo scontrol update NodeName=gpu-test-01 State=RESUME

Then:

sinfo

13. Validate accounting

Run on slurm-controller:

sacctmgr show cluster
sacctmgr show account
sacctmgr show user

Then:

sacct

At this point it may show little or nothing because no jobs have run yet. That is fine.


14. Run CPU test job

On slurm-controller:

cat > cpu-job.sh <<'EOF'
#!/bin/bash
#SBATCH --job-name=cpu-test
#SBATCH --partition=cpu
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --time=00:02:00
#SBATCH --output=cpu-test-%j.out

echo "Running on $(hostname)"
echo "Job ID: $SLURM_JOB_ID"
echo "Partition: $SLURM_JOB_PARTITION"
echo "CPUs per task: $SLURM_CPUS_PER_TASK"
sleep 30
echo "Done"
EOF

Submit:

sbatch cpu-job.sh

Watch:

squeue

Check output:

ls -l cpu-test-*.out
cat cpu-test-*.out

Check accounting:

sacct --format=JobID,JobName,Partition,AllocCPUS,State,ExitCode,Elapsed

15. Run GPU test job

On slurm-controller:

cat > gpu-job.sh <<'EOF'
#!/bin/bash
#SBATCH --job-name=gpu-test
#SBATCH --partition=gpu
#SBATCH --nodes=1
#SBATCH --gres=gpu:1
#SBATCH --time=00:05:00
#SBATCH --output=gpu-test-%j.out

echo "Running on $(hostname)"
echo "Job ID: $SLURM_JOB_ID"
echo "Partition: $SLURM_JOB_PARTITION"
echo "CUDA_VISIBLE_DEVICES=$CUDA_VISIBLE_DEVICES"
nvidia-smi
EOF

Submit:

sbatch gpu-job.sh

Watch:

squeue

Check:

cat gpu-test-*.out
sacct --format=JobID,JobName,Partition,AllocTRES,State,ExitCode,Elapsed

Expected output should show:

Running on gpu-test-01
nvidia-smi output

16. Troubleshooting quick reference

Munge failure

sudo systemctl status munge --no-pager
sudo journalctl -u munge --no-pager -n 100
ls -l /etc/munge/munge.key

Expected:

-r-------- munge munge /etc/munge/munge.key

Fix:

sudo chown munge:munge /etc/munge/munge.key
sudo chmod 400 /etc/munge/munge.key
sudo systemctl restart munge

SlurmDBD failure

sudo systemctl status slurmdbd --no-pager
sudo journalctl -u slurmdbd --no-pager -n 100
sudo tail -n 100 /var/log/slurm/slurmdbd.log
mysql -u slurm -p slurm_acct_db

Slurm controller failure

sudo slurmctld -t
sudo systemctl status slurmctld --no-pager
sudo journalctl -u slurmctld --no-pager -n 100

Compute node failure

On the compute node:

sudo slurmd -C
sudo systemctl status slurmd --no-pager
sudo journalctl -u slurmd --no-pager -n 100

GPU/GRES failure

On gpu-test-01:

nvidia-smi
ls -l /dev/nvidia*
cat /etc/slurm/gres.conf
sudo slurmd -C
sudo journalctl -u slurmd --no-pager -n 100

On slurm-controller:

scontrol show node gpu-test-01 | grep -i gres
sinfo -o "%20N %10T %20G"

17. Final expected state

On slurm-controller, this should work:

sinfo -o "%20N %10T %10c %10m %20G"

Expected final shape:

NODELIST             STATE      CPUS       MEMORY     GRES
slurm-cpu1 idle 4 7900 (null)
slurm-cpu2 idle 4 7900 (null)
gpu-test-01 idle 4 7900 gpu:gtx970:1

And these should all work:

sbatch cpu-job.sh
sbatch gpu-job.sh
squeue
sacct
scontrol show node gpu-test-01