
Issue
While preparing the OpenStack tenant VMs for Slurm, the Slurm nodes could reach each other on the private tenant network, but they could not reach the default gateway, DNS, or the internet.
From slurm-controller, VM-to-VM traffic worked:
ping -c 2 slurm-cpu1
ping -c 2 slurm-cpu2
But external routing failed:
ping -c 2 10.10.10.1
ping -c 2 8.8.8.8
ping -c 2 archive.ubuntu.com
The VM had a DHCP-provided address and default route:
10.10.10.30/24
default via 10.10.10.1
but the gateway was unreachable:
From 10.10.10.30 Destination Host Unreachable
This caused apt update to fail:
Temporary failure resolving 'archive.ubuntu.com'
Temporary failure resolving 'security.ubuntu.com'
As a result, the Slurm packages could not be installed:
Unable to locate package munge
Unable to locate package slurm-wlm
Unable to locate package slurmdbd
Unable to locate package mariadb-server
The root cause was: the gpu-private tenant network existed, DHCP worked, and VM-to-VM connectivity worked, but there was no Neutron router and no external/provider network.
Investigation
The private network was confirmed as:
gpu-private
Network ID: 54829687-5a62-4d95-a7d0-42f3e30f7dbf
Subnet: gpu-private-subnet
CIDR: 10.10.10.0/24
Gateway: 10.10.10.1
DNS: 1.1.1.1
The subnet advertised 10.10.10.1 as the default gateway, but:
openstack router list
returned nothing.
That meant no router interface existed at 10.10.10.1.
The OpenStack network agent state was then checked. Neutron was healthy:
DHCP agent alive
Metadata agent alive
Open vSwitch agents alive
L3 agent alive
neutron_l3_agent healthy
neutron_server healthy
The Kolla config also showed a valid external interface:
network_interface: eth0
neutron_external_interface: enp6s19
The enp6s19 interface had no IP address, which is appropriate for a Neutron external/provider interface. The missing part was not Kolla services; it was the OpenStack tenant/external network configuration.
Solution
The external provider network was created:
openstack network create public \
--external \
--provider-network-type flat \
--provider-physical-network physnet1
Then the external subnet was created on the homelab LAN:
openstack subnet create public-subnet \
--network public \
--subnet-range 192.168.1.0/24 \
--allocation-pool start=192.168.1.200,end=192.168.1.220 \
--gateway 192.168.1.1 \
--dns-nameserver 1.1.1.1 \
--dns-nameserver 8.8.8.8 \
--no-dhcp
This created the external network public and allocated the router external address from the safe pool. In the final state, the router received:
External router IP: 192.168.1.215
The Neutron router was then created:
openstack router create gpu-private-router
The router was given an external gateway:
openstack router set \
--external-gateway public \
gpu-private-router
The private Slurm subnet was attached:
openstack router add subnet \
gpu-private-router \
gpu-private-subnet
After this, OpenStack created the router namespace:
qrouter-80a7cf7d-711f-4329-ad33-bb793a05756f
and the DHCP namespace already existed:
qdhcp-54829687-5a62-4d95-a7d0-42f3e30f7dbf
The router namespace showed the correct internal and external interfaces:
qg-d385ebd7-5a 192.168.1.215/24
qr-fe08d65d-ab 10.10.10.1/24
default via 192.168.1.1
The OVS bridge mapping was also confirmed. br-ex had the external physical interface attached:
Bridge br-ex
Port enp6s19
Port phy-br-ex
This confirmed that the Neutron external provider bridge was correctly connected to the LAN.
Validation
The router namespace could reach all required networks:
sudo ip netns exec qrouter-$ROUTER_ID ping -c 2 10.10.10.1
sudo ip netns exec qrouter-$ROUTER_ID ping -c 2 192.168.1.1
sudo ip netns exec qrouter-$ROUTER_ID ping -c 2 8.8.8.8
Results:
10.10.10.1 reachable
192.168.1.1 reachable
8.8.8.8 reachable
The DHCP namespace could also reach both the private gateway and the Slurm controller:
sudo ip netns exec qdhcp-54829687-5a62-4d95-a7d0-42f3e30f7dbf ping -c 2 10.10.10.1
sudo ip netns exec qdhcp-54829687-5a62-4d95-a7d0-42f3e30f7dbf ping -c 2 10.10.10.30
Results:
10.10.10.1 reachable
10.10.10.30 reachable
At that point, the OpenStack routing layer was fixed. The remaining step was to reboot or renew DHCP on the Slurm VMs so they picked up clean routing and resolver state, then rerun:
sudo apt clean
sudo apt update
Final root cause
The problem was not initially DNS.
DNS failed because the VM had no working path to its gateway or the internet. The deeper issue was:
gpu-private subnet existed
DHCP worked
VM-to-VM traffic worked
but no external provider network existed
and no Neutron router was attached to gpu-private-subnet
After creating:
public external network
public-subnet
gpu-private-router
router external gateway
router interface to gpu-private-subnet
the private Slurm VMs gained a valid route:
10.10.10.0/24 → 10.10.10.1 → Neutron router → 192.168.1.215 → 192.168.1.1 → internet
That restored the path needed for DNS, apt update, and the subsequent Munge/Slurm package installation.