Slurm on Homelab OpenStack

Below is the corrected Slurm plan based on your actual current OpenStack/Kolla/GPU topology, not the earlier assumed 192.168.1.60–63 layout.

Your current base is:

Proxmox physical hosts
192.168.1.10 pve0
192.168.1.11 pve1
192.168.1.12 pve2

Kolla/OpenStack infrastructure VMs
192.168.1.50 kolla-vip
192.168.1.51 ctrl
192.168.1.52 cmp
192.168.1.53 gpu

OpenStack tenant GPU VM
10.10.10.36 gpu-test-01

The key correction is this:

Do not build Slurm directly on pve0/pve1/pve2.
Do not build Slurm directly on ctrl/cmp/gpu OpenStack infrastructure nodes.
Build Slurm as tenant VMs inside OpenStack.
Use gpu-test-01 as the first Slurm GPU compute node.

That keeps your architecture clean and realistic.


Corrected target architecture

Physical layer
────────────────────────────────────────────
pve0 192.168.1.10 Proxmox + physical GTX970
pve1 192.168.1.11 Proxmox
pve2 192.168.1.12 Proxmox


OpenStack infrastructure layer
────────────────────────────────────────────
kolla-vip 192.168.1.50 OpenStack API VIP / Horizon / Keystone endpoint
ctrl 192.168.1.51 Kolla controller services
cmp 192.168.1.52 Nova compute
gpu 192.168.1.53 Nova compute with PCI passthrough


OpenStack tenant Slurm layer
────────────────────────────────────────────
slurm-controller 10.10.10.30 slurmctld + slurmdbd + MariaDB
slurm-cpu1 10.10.10.31 slurmd
slurm-cpu2 10.10.10.32 slurmd
gpu-test-01 10.10.10.36 slurmd + NVIDIA GTX970 via Nova passthrough

I would reuse gpu-test-01 as the Slurm GPU node rather than rebuilding it immediately. You have already validated that Nova can schedule it to the GPU host, libvirt attaches the PCI hostdev, and the guest can see the GTX970. That is valuable; build on it.


Revised Slurm VM plan

Use the existing tenant network:

Tenant network: 10.10.10.0/24
Existing GPU VM: 10.10.10.36

Create three additional tenant VMs:

slurm-controller  10.10.10.30
slurm-cpu1 10.10.10.31
slurm-cpu2 10.10.10.32
gpu-test-01 10.10.10.36

So the Slurm cluster becomes:

Partition: cpu
slurm-cpu1
slurm-cpu2

Partition: gpu
gpu-test-01

Controller:
slurm-controller

Access model

Because your tenant network is private, access is still through the Neutron DHCP namespace on the OpenStack network node/controller.

You already used this pattern:

sudo ip netns exec qdhcp-54829687-5a62-4d95-a7d0-42f3e30f7dbf \
ssh -i /home/sont/.ssh/id_ed25519_kolla ubuntu@10.10.10.36

For Slurm, use the same access pattern:

sudo ip netns exec qdhcp-54829687-5a62-4d95-a7d0-42f3e30f7dbf \
ssh -i /home/sont/.ssh/id_ed25519_kolla ubuntu@10.10.10.30

Then from slurm-controller, SSH internally to the Slurm compute nodes:

ssh ubuntu@10.10.10.31
ssh ubuntu@10.10.10.32
ssh ubuntu@10.10.10.36

This means:

Your laptop / shell

ctrl or network namespace

Neutron qdhcp namespace

10.10.10.0/24 tenant network

Slurm VMs

Recommended hostnames and /etc/hosts

Inside all Slurm VMs, use this mapping:

sudo tee -a /etc/hosts > /dev/null <<'EOF'
10.10.10.30 slurm-controller
10.10.10.31 slurm-cpu1
10.10.10.32 slurm-cpu2
10.10.10.36 gpu-test-01 slurm-gpu1
EOF

You can either keep the GPU node hostname as gpu-test-01, or alias it as slurm-gpu1.

For learning clarity, I would do this:

OpenStack instance name: gpu-test-01
Slurm NodeName: gpu-test-01
Optional alias: slurm-gpu1

That avoids confusion between the actual Nova VM and the Slurm logical node.


Updated Slurm role assignment

slurm-controller10.10.10.30

Run:

munge
slurmctld
slurmdbd
mariadb

Purpose:

Slurm scheduler
Slurm accounting
Job history
Cluster control

slurm-cpu110.10.10.31

Run:

munge
slurmd

Purpose:

CPU jobs
MPI jobs
array jobs
general batch workload

slurm-cpu210.10.10.32

Run:

munge
slurmd

Purpose:

CPU jobs
MPI jobs
array jobs
general batch workload

gpu-test-0110.10.10.36

Run:

munge
slurmd
NVIDIA driver
nvidia-smi
optional nvidia-dcgm-exporter

Purpose:

GPU jobs
GRES learning
Nova PCI passthrough validation
OpenStack GPU scheduling validation

Corrected architecture

┌──────────────────────────────────────────────────────────────────────┐
│ PROXMOX CLUSTER │
│ │
│ pve0 pve1 pve2 │
│ 192.168.1.10 192.168.1.11 192.168.1.12 │
│ GTX970 physical GPU │
└───────────────┬──────────────────────────────────────────────────────┘

│ PCI passthrough / VFIO

┌──────────────────────────────────────────────────────────────────────┐
│ OPENSTACK INFRASTRUCTURE LAYER │
│ │
│ kolla-vip ctrl cmp gpu │
│ 192.168.1.50 192.168.1.51 192.168.1.52 192.168.1.53 │
│ │
│ ctrl: Keystone, Glance, Horizon, Neutron, Nova API, Scheduler │
│ cmp: Nova compute + libvirt │
│ gpu: Nova compute + libvirt + PCI passthrough alias │
└───────────────┬──────────────────────────────────────────────────────┘

│ Nova schedules tenant VMs

┌──────────────────────────────────────────────────────────────────────┐
│ OPENSTACK TENANT NETWORK │
│ 10.10.10.0/24 │
│ │
│ slurm-controller slurm-cpu1 slurm-cpu2 gpu-test-01 │
│ 10.10.10.30 10.10.10.31 10.10.10.32 10.10.10.36 │
│ │
│ slurmctld slurmd slurmd slurmd │
│ slurmdbd NVIDIA GTX970 │
│ MariaDB GRES gpu:1 │
└───────────────┬──────────────────────────────────────────────────────┘

│ sbatch / srun

┌──────────────────────────────────────────────────────────────────────┐
│ SLURM WORKLOADS │
│ │
│ CPU jobs MPI jobs Array jobs GPU jobs via GRES │
└──────────────────────────────────────────────────────────────────────┘

OpenStack resources to create

You already have:

gpu-test-01  10.10.10.36

Now create:

slurm-controller
slurm-cpu1
slurm-cpu2

Example:

source /etc/kolla/admin-openrc.sh

openstack server create \
--image ubuntu-24.04 \
--flavor m1.medium \
--network gpu-private \
--security-group gpu-secgroup \
--key-name id_ed25519_kolla \
slurm-controller

openstack server create \
--image ubuntu-24.04 \
--flavor m1.medium \
--network gpu-private \
--security-group gpu-secgroup \
--key-name id_ed25519_kolla \
slurm-cpu1

openstack server create \
--image ubuntu-24.04 \
--flavor m1.medium \
--network gpu-private \
--security-group gpu-secgroup \
--key-name id_ed25519_kolla \
slurm-cpu2

If you want exact fixed IPs, create Neutron ports first:

openstack port create \
--network gpu-private \
--fixed-ip subnet=gpu-private-subnet,ip-address=10.10.10.30 \
slurm-controller-port

openstack port create \
--network gpu-private \
--fixed-ip subnet=gpu-private-subnet,ip-address=10.10.10.31 \
slurm-cpu1-port

openstack port create \
--network gpu-private \
--fixed-ip subnet=gpu-private-subnet,ip-address=10.10.10.32 \
slurm-cpu2-port

Then create the servers with ports:

openstack server create \
--image ubuntu-24.04 \
--flavor m1.medium \
--port slurm-controller-port \
--key-name id_ed25519_kolla \
slurm-controller

openstack server create \
--image ubuntu-24.04 \
--flavor m1.medium \
--port slurm-cpu1-port \
--key-name id_ed25519_kolla \
slurm-cpu1

openstack server create \
--image ubuntu-24.04 \
--flavor m1.medium \
--port slurm-cpu2-port \
--key-name id_ed25519_kolla \
slurm-cpu2

You may need to adjust these names if your actual network/subnet/security-group names differ. Confirm with:

openstack network list
openstack subnet list
openstack security group list
openstack keypair list
openstack flavor list
openstack image list

Required security group rules

The Slurm nodes need to talk to each other on the tenant network.

Allow at least:

SSH                 TCP 22
Munge/Slurm comms internal node-to-node allowed
slurmctld TCP 6817
slurmd TCP 6818
slurmdbd TCP 6819
ICMP ping/debugging

Example:

openstack security group rule create \
--proto tcp \
--dst-port 22 \
gpu-secgroup

openstack security group rule create \
--proto tcp \
--dst-port 6817:6819 \
gpu-secgroup

openstack security group rule create \
--proto icmp \
gpu-secgroup

If the security group already allows all traffic between members of the same security group, you may not need extra internal rules. But for learning, make the Slurm ports explicit.


Corrected slurm.conf

On slurm-controller, use the real tenant hostnames/IPs.

Example:

ClusterName=openstack-slurm-lab
SlurmctldHost=slurm-controller

SlurmUser=slurm
AuthType=auth/munge
CryptoType=crypto/munge

StateSaveLocation=/var/spool/slurmctld
SlurmdSpoolDir=/var/spool/slurmd

SwitchType=switch/none
MpiDefault=none
ProctrackType=proctrack/cgroup
TaskPlugin=task/affinity,task/cgroup

SchedulerType=sched/backfill
SelectType=select/cons_tres
SelectTypeParameters=CR_Core_Memory

AccountingStorageType=accounting_storage/slurmdbd
AccountingStorageHost=slurm-controller
JobAcctGatherType=jobacct_gather/linux

GresTypes=gpu

SlurmctldPort=6817
SlurmdPort=6818

SlurmctldLogFile=/var/log/slurm/slurmctld.log
SlurmdLogFile=/var/log/slurm/slurmd.log

NodeName=slurm-cpu1 CPUs=2 RealMemory=3900 State=UNKNOWN
NodeName=slurm-cpu2 CPUs=2 RealMemory=3900 State=UNKNOWN
NodeName=gpu-test-01 CPUs=4 RealMemory=7900 Gres=gpu:gtx970:1 State=UNKNOWN

PartitionName=cpu Nodes=slurm-cpu1,slurm-cpu2 Default=YES MaxTime=INFINITE State=UP
PartitionName=gpu Nodes=gpu-test-01 MaxTime=INFINITE State=UP

Adjust CPUs and RealMemory after checking each VM:

lscpu
free -m

Do not overstate memory in slurm.conf; Slurm can mark nodes invalid or drained if configured resources do not match reality.


GPU node gres.conf

On gpu-test-01:

sudo tee /etc/slurm/gres.conf > /dev/null <<'EOF'
Name=gpu Type=gtx970 File=/dev/nvidia0
EOF

Verify the GPU inside gpu-test-01:

lspci | grep -i nvidia
nvidia-smi
ls -l /dev/nvidia*

Expected:

NVIDIA GTX 970 visible
/dev/nvidia0 exists
nvidia-smi works

Then Slurm can schedule it as:

#SBATCH --partition=gpu
#SBATCH --gres=gpu:1

Installation order

Use this order.

Step 1 — Keep OpenStack infrastructure untouched

Do not install Slurm on:

pve0
pve1
pve2
ctrl
cmp
gpu

Those are infrastructure hosts.

Step 2 — Use existing GPU VM

Keep:

gpu-test-01 10.10.10.36

Validate again:

sudo ip netns exec qdhcp-54829687-5a62-4d95-a7d0-42f3e30f7dbf \
ssh -i /home/sont/.ssh/id_ed25519_kolla ubuntu@10.10.10.36

hostname
lspci | grep -i nvidia
nvidia-smi

Step 3 — Create Slurm controller and CPU nodes

Create:

slurm-controller 10.10.10.30
slurm-cpu1 10.10.10.31
slurm-cpu2 10.10.10.32

Step 4 — Configure /etc/hosts on all Slurm VMs

sudo tee -a /etc/hosts > /dev/null <<'EOF'
10.10.10.30 slurm-controller
10.10.10.31 slurm-cpu1
10.10.10.32 slurm-cpu2
10.10.10.36 gpu-test-01 slurm-gpu1
EOF

Step 5 — Install Munge everywhere

On all four Slurm VMs:

sudo apt update
sudo apt install -y munge slurm-wlm

On controller only:

sudo apt install -y mariadb-server slurmdbd

Step 6 — Configure Munge

Generate on controller, copy to:

slurm-cpu1
slurm-cpu2
gpu-test-01

Step 7 — Configure MariaDB and slurmdbd

Only on:

slurm-controller

Step 8 — Configure slurm.conf

Same file on all Slurm nodes.

Step 9 — Configure gres.conf

Only on:

gpu-test-01

Step 10 — Start services

On controller:

sudo systemctl enable --now munge
sudo systemctl enable --now slurmdbd
sudo systemctl enable --now slurmctld

On compute nodes:

sudo systemctl enable --now munge
sudo systemctl enable --now slurmd

Validation commands

Run these from slurm-controller.

Check cluster

sinfo
sinfo -Nel
sinfo -o "%20N %10T %10c %10m %20G"

Expected:

slurm-cpu1     idle      no GPU
slurm-cpu2 idle no GPU
gpu-test-01 idle gpu:gtx970:1

Check GPU node

scontrol show node gpu-test-01

Look for:

Gres=gpu:gtx970:1
CfgTRES=cpu=...,mem=...,gres/gpu=1

Run CPU job

sbatch --partition=cpu cpu-job.sh
squeue
sacct

Run MPI job

sbatch --partition=cpu mpi-job.sh

Run array job

sbatch --partition=cpu array-job.sh

Run GPU job

sbatch --partition=gpu gpu-job.sh

GPU job script:

cat > gpu-job.sh <<'EOF'
#!/bin/bash
#SBATCH --job-name=gpu-test
#SBATCH --partition=gpu
#SBATCH --nodes=1
#SBATCH --gres=gpu:1
#SBATCH --time=00:05:00
#SBATCH --output=gpu-test-%j.out

echo "Running on $(hostname)"
echo "CUDA_VISIBLE_DEVICES=$CUDA_VISIBLE_DEVICES"
nvidia-smi
EOF

Submit:

sbatch gpu-job.sh

Check:

cat gpu-test-<jobid>.out
sacct -j <jobid> --format=JobID,JobName,Partition,AllocTRES,State,ExitCode

What the setup now demonstrates

This design demonstrates a much more realistic stack than bare-metal Slurm:

Proxmox provides physical virtualization
OpenStack provides cloud-style VM lifecycle
Nova schedules GPU VM onto the GPU compute host
Libvirt/VFIO attaches GTX970 to the guest
Slurm treats the GPU VM as a GPU compute node
GRES exposes the GPU to Slurm jobs
Users submit jobs through sbatch/srun
Accounting records job history through slurmdbd/MariaDB

That is the important learning loop.


Phase 5 revised: Terraform + OpenStack + Slurm

Once the manual setup works, automate this exact topology.

Terraform should create:

slurm-controller  10.10.10.30
slurm-cpu1 10.10.10.31
slurm-cpu2 10.10.10.32

For gpu-test-01, you have two choices:

Option A — Keep existing gpu-test-01

Best for now.

Terraform manages only:

slurm-controller
slurm-cpu1
slurm-cpu2

Ansible configures all four nodes, including the pre-existing GPU VM.

This is the safest next step.

Option B — Recreate GPU node with Terraform

Later, Terraform also creates:

slurm-gpu1 / gpu-test-01

using the GPU flavor:

g1.gtx970

with extra spec:

pci_passthrough:alias = nvidia-gpu:1

This is better for full automation, but only after the static/manual version works.


Final recommended plan

Build now

slurm-controller  10.10.10.30
slurm-cpu1 10.10.10.31
slurm-cpu2 10.10.10.32
gpu-test-01 10.10.10.36

Keep infrastructure clean

pve0/pve1/pve2 = physical virtualization only
ctrl/cmp/gpu = OpenStack/Kolla infrastructure only
Slurm = tenant workload inside OpenStack

Learn in this order

1. Basic Slurm commands: sinfo, squeue, scontrol
2. CPU jobs
3. Slurm accounting: slurmdbd, MariaDB, sacct
4. MPI jobs
5. Array jobs
6. GPU/GRES jobs on gpu-test-01
7. Terraform-created Slurm CPU nodes
8. Ansible-configured Slurm
9. Terraform-created GPU Slurm node
10. Elastic/bursting Slurm workers

This gives you a clean, technically correct homelab that maps directly onto real AI/HPC cloud infrastructure: OpenStack provisions capacity, Slurm schedules workloads, and GPU passthrough exposes accelerator resources to jobs.