Creating Ubuntu 24.04 cloud-init template

First check what currently exists:

qm status 9000
qm config 9000

If 9000 already exists and you want to replace it:

qm stop 9000 2>/dev/null || true
qm unlock 9000 2>/dev/null || true
qm destroy 9000 --purge

Check for old leftover disks:

pvesm list ceph-vm | grep 9000 || true
pvesm list local-lvm | grep 9000 || true

If you see old 9000 disks after destroy, remove only those clearly belonging to VMID 9000.

Example:

pvesm free ceph-vm:vm-9000-disk-0
pvesm free ceph-vm:vm-9000-cloudinit

1. Install required tools on `pve0`

apt update
apt install -y wget libguestfs-tools

Set libguestfs to use direct mode:

export LIBGUESTFS_BACKEND=direct

Make it persistent for this root shell if desired:

echo 'export LIBGUESTFS_BACKEND=direct' >> /root/.bashrc

2. Download a fresh Ubuntu 24.04 cloud image

mkdir -p /var/lib/vz/template/iso
cd /var/lib/vz/template/iso

rm -f noble-server-cloudimg-amd64.img
rm -f ubuntu-24.04-server-cloudimg-amd64.img

wget -O ubuntu-24.04-server-cloudimg-amd64.img \
  https://cloud-images.ubuntu.com/noble/current/noble-server-cloudimg-amd64.img

Check it exists:

ls -lh ubuntu-24.04-server-cloudimg-amd64.img

3. Create the `sont` public key file

Create a public key file on pve0.

Use your actual public key. Based on what you used earlier, this example uses your existing sont@blusas.co.uk key.

cat > /tmp/sont.pub <<'EOF'
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQCtlXjKmo9ww04QhCd34K5Z3LCScMiOSCNI/WWJdLmzuKz8BINOjpsmLXFXl3W7o/dWRbceRxqe60JOEjtCP7NQNx9LNPn4rWHSwbktxiHB8BEQLCYX1NbKVks5stGvYQhFx8tzUPz4q5HfFOQ9dNXcSVJRSn65JXh25nDQZV7SJ+W7PKnN3fj5ot94XYr17X9QNgfvrSxIVhHqN0H3cf9j27uEAwrROdNVqj3OX8atKvhloBnc5pqXXJG3FBJT5g0VwY1b0zYe8t9LVNViOr+ML2vQiQaNbvzDyy9g2+WcuFxBHDjVfpPhAR4EUK8jRFrSF759YNkAk98zTJAxOa2B2QYsQ8X7X2aX11Z/oO1Drat6myo8mvc8H8+EHfkt4X+ydhnlme7agyHupZeSf5tgdQRPz5cyuUH1oqWq0cm4RafLcpBaTUbmpc5zcRr3UKpUTGj4SsKWJ2KzmiXI2WdgIUh+zcR+Umeu/TcF4Mo1we/5U8w31ER8ritQnQjd7Yo3dlabu/m4DjVOtJK2ZaQTMXjExfQWRF+lNhD41igmOIpZWSZ8ZkcrPLHqPTRdvXkCadu3hpWFGe/zsFP03yoJtmShTA6GaUFx5nQem6SQFqZJBKJPD78Hpg5ZeyH0deZF5M4+WhMg1vL0kNuuzUUM3I04pVzczp3A43MDqYMSjQ== sont@blusas.co.uk
EOF

Verify:

cat /tmp/sont.pub

4. Inject the `sont` user, passwordless sudo, and QEMU guest agent

Run this against the downloaded image:

virt-customize -a ubuntu-24.04-server-cloudimg-amd64.img \
  --install qemu-guest-agent,cloud-init,cloud-guest-utils,sudo,bash,curl,wget,vim,net-tools,iproute2 \
  --run-command 'useradd -m -s /bin/bash -G sudo,adm sont || true' \
  --mkdir /home/sont/.ssh \
  --upload /tmp/sont.pub:/home/sont/.ssh/authorized_keys \
  --run-command 'chown -R sont:sont /home/sont/.ssh' \
  --run-command 'chmod 700 /home/sont/.ssh' \
  --run-command 'chmod 600 /home/sont/.ssh/authorized_keys' \
  --run-command 'echo "sont ALL=(ALL) NOPASSWD:ALL" > /etc/sudoers.d/90-sont-nopasswd' \
  --run-command 'chmod 0440 /etc/sudoers.d/90-sont-nopasswd' \
  --run-command 'systemctl enable qemu-guest-agent' \
  --run-command 'cloud-init clean --logs' \
  --truncate /etc/machine-id \
  --run-command 'rm -f /var/lib/dbus/machine-id' \
  --run-command 'ln -sf /etc/machine-id /var/lib/dbus/machine-id'

If you get a libguestfs error, re-run:

export LIBGUESTFS_BACKEND=direct

Then repeat the virt-customize command.

5. Create the Proxmox VM shell for the template

Create VMID 9000 with the name ubuntu-24-template:

qm create 9000 \
  --name ubuntu-24-template \
  --memory 2048 \
  --cores 2 \
  --cpu host \
  --net0 virtio,bridge=vmbr0,firewall=0 \
  --scsihw virtio-scsi-single \
  --serial0 socket \
  --vga serial0 \
  --agent enabled=1,fstrim_cloned_disks=1,type=virtio \
  --ostype l26 \
  --onboot 0

6. Import the Ubuntu disk into `ceph-vm`

qm importdisk 9000 ubuntu-24.04-server-cloudimg-amd64.img ceph-vm --format raw

Check that the imported disk exists:

pvesm list ceph-vm | grep 9000

You should see something like:

ceph-vm:vm-9000-disk-0

7. Attach the imported disk

qm set 9000 \
  --scsi0 ceph-vm:vm-9000-disk-0,discard=on,ssd=1,iothread=1

Set boot order:

qm set 9000 --boot order=scsi0

8. Add Cloud-Init drive

qm set 9000 --ide2 ceph-vm:cloudinit

Set Cloud-Init user to sont:

qm set 9000 --ciuser sont

Inject the public key through Proxmox Cloud-Init as well:

qm set 9000 --sshkeys /tmp/sont.pub

Set DNS defaults:

qm set 9000 --nameserver "192.168.1.1 1.1.1.1"

For template testing, give it a temporary static IP:

qm set 9000 \
  --ipconfig0 ip=192.168.1.99/24,gw=192.168.1.1

Regenerate the Cloud-Init ISO:

qm cloudinit update 9000

9. Verify the template VM config before booting

qm config 9000

You want to see these key lines:

name: ubuntu-24-template
agent: enabled=1,...
boot: order=scsi0
ciuser: sont
ide2: ceph-vm:vm-9000-cloudinit,media=cdrom
ipconfig0: ip=192.168.1.99/24,gw=192.168.1.1
nameserver: 192.168.1.1 1.1.1.1
net0: virtio=...,bridge=vmbr0,firewall=0
scsi0: ceph-vm:vm-9000-disk-0,...
serial0: socket
vga: serial0

10. Start the VM for testing

qm start 9000

Wait 60–90 seconds.

Then try SSH:

ssh sont@192.168.1.99 hostname

Test passwordless sudo:

ssh sont@192.168.1.99 "whoami; sudo whoami"

Expected:

sont
root

Test QEMU guest agent:

qm agent 9000 ping

Expected:

successfully pinged guest agent

Check Cloud-Init status:

ssh sont@192.168.1.99 "cloud-init status --long"

Expected:

status: done

11. Clean the VM before converting to template

SSH into the VM and clean it:

ssh sont@192.168.1.99 '
  sudo cloud-init clean --logs
  sudo truncate -s 0 /etc/machine-id
  sudo rm -f /var/lib/dbus/machine-id
  sudo ln -sf /etc/machine-id /var/lib/dbus/machine-id
  sudo apt autoremove -y
  sudo apt clean
  history -c
  sudo shutdown -h now
'

Wait until it is stopped:

qm status 9000

Expected:

status: stopped

If it does not shut down cleanly after a minute:

qm stop 9000 --skiplock 1

12. Remove the temporary test IP before making it the final template

You can leave ciuser and SSH key, but remove the temporary 192.168.1.99 address so Terraform can set per-VM IPs later.

Set it back to DHCP:

qm set 9000 --ipconfig0 ip=dhcp

Or remove the IP config entirely:

qm set 9000 --delete ipconfig0

For Terraform-managed clones, I prefer removing it:

qm set 9000 --delete ipconfig0

Regenerate Cloud-Init:

qm cloudinit update 9000

13. Convert VM 9000 to a template

qm template 9000

Verify:

qm config 9000 | egrep 'name|template|agent|ciuser|ide2|net0|scsi0|boot|serial0'

Expected:

name: ubuntu-24-template
template: 1
agent: enabled=1,...
ciuser: sont
ide2: ceph-vm:vm-9000-cloudinit,media=cdrom
net0: virtio=...,bridge=vmbr0,firewall=0
scsi0: ceph-vm:vm-9000-disk-0,...
boot: order=scsi0
serial0: socket

14. Terraform settings to use this template

In terraform.tfvars:

template_vm_id = 9000
template_node  = "pve0"

vm_user = "sont"

datastore_id            = "ceph-vm"
cloud_init_datastore_id = "ceph-vm"
efi_datastore_id        = "ceph-vm"

ssh_public_keys = [
  "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQCtlXjKmo9ww04QhCd34K5Z3LCScMiOSCNI/WWJdLmzuKz8BINOjpsmLXFXl3W7o/dWRbceRxqe60JOEjtCP7NQNx9LNPn4rWHSwbktxiHB8BEQLCYX1NbKVks5stGvYQhFx8tzUPz4q5HfFOQ9dNXcSVJRSn65JXh25nDQZV7SJ+W7PKnN3fj5ot94XYr17X9QNgfvrSxIVhHqN0H3cf9j27uEAwrROdNVqj3OX8atKvhloBnc5pqXXJG3FBJT5g0VwY1b0zYe8t9LVNViOr+ML2vQiQaNbvzDyy9g2+WcuFxBHDjVfpPhAR4EUK8jRFrSF759YNkAk98zTJAxOa2B2QYsQ8X7X2aX11Z/oO1Drat6myo8mvc8H8+EHfkt4X+ydhnlme7agyHupZeSf5tgdQRPz5cyuUH1oqWq0cm4RafLcpBaTUbmpc5zcRr3UKpUTGj4SsKWJ2KzmiXI2WdgIUh+zcR+Umeu/TcF4Mo1we/5U8w31ER8ritQnQjd7Yo3dlabu/m4DjVOtJK2ZaQTMXjExfQWRF+lNhD41igmOIpZWSZ8ZkcrPLHqPTRdvXkCadu3hpWFGe/zsFP03yoJtmShTA6GaUFx5nQem6SQFqZJBKJPD78Hpg5ZeyH0deZF5M4+WhMg1vL0kNuuzUUM3I04pVzczp3A43MDqYMSjQ== sont@blusas.co.uk"
]

Make sure your Terraform initialization block uses:

user_account {
  username = var.vm_user
  keys     = var.ssh_public_keys
}

15. Quick clone test

Before rebuilding all OpenStack VMs, test one clone manually:

qm clone 9000 9100 \
  --name test-ubuntu-24 \
  --full 1 \
  --storage ceph-vm

Set a test IP:

qm set 9100 \
  --ipconfig0 ip=192.168.1.98/24,gw=192.168.1.1 \
  --nameserver "192.168.1.1 1.1.1.1"

Regenerate Cloud-Init:

qm cloudinit update 9100

Start it:

qm start 9100

Test:

ssh sont@192.168.1.98 "hostname; whoami; sudo whoami"
qm agent 9100 ping

Expected:

test-ubuntu-24
sont
root
successfully pinged guest agent

Destroy the test clone:

qm stop 9100 2>/dev/null || true
qm destroy 9100 --purge

At that point, ubuntu-24-template is ready for Terraform and your OpenStack lab.

Creating the OpenStack Cluster using Terraform

OpenStack Step 1 Terraform for Proxmox VE 9.x

This Terraform project creates the three Ubuntu 24.04 VMs used for the first OpenStack/Kolla-Ansible lab step on a 3-node Proxmox cluster.

It is designed for this lab topology:

VM name	VMID	Proxmox node	OpenStack role	IP address	vCPU	RAM	Disk
`ctrl`	`1210`	`pve1`	Control plane / network / storage / monitoring	`192.168.1.51/24`	8	16 GB	100 GB
`cmp`	`1211`	`pve2`	Nova compute	`192.168.1.52/24`	16	64 GB	100 GB
`gpu`	`1212`	`pve0`	Nova compute, GPU-capable later	`192.168.1.53/24`	8	32 GB	100 GB

The default VM names are deliberately short because they become hostnames and Kolla-Ansible inventory names.

What this creates

Terraform creates one Proxmox VM for each item in var.vms.

Each VM is:

cloned from the Ubuntu 24.04 cloud-init template VM 9000
stored on ceph-vm
connected to vmbr0
assigned a static cloud-init IP address
started automatically

The VMs use:

Setting	Value
Ubuntu template	`ubuntu-24-template`
Template VMID	`9000`
Template node	`pve0`
Disk storage	`ceph-vm`
Cloud-init storage	`ceph-vm`
Bridge	`vmbr0`
Cloud-init user	`sont`
CPU type	`host`
Machine type	`q35`
Firmware	OVMF / UEFI
SCSI controller	`virtio-scsi-single`
QEMU guest agent	enabled

Files

File	Purpose
`versions.tf`	Pins Terraform and provider versions.
`provider.tf`	Configures the Proxmox provider.
`variables.tf`	Defines API, template, network, storage, user, GPU, and VM variables.
`main.tf`	Creates the three Proxmox VMs.
`outputs.tf`	Generates VM summary, `/etc/hosts` entries, SSH checks, and Kolla-Ansible inventory.
`terraform.tfvars.example`	Safe example configuration.
`terraform.tfvars`	Working lab values. Replace the API token before use.
`automation_user.sh`	Creates the Proxmox API user and token for Terraform.

Prerequisites

Before running Terraform, confirm the following are complete.

1. Proxmox cluster

The cluster should contain:

pve0
pve1
pve2

Check from any node:

pvecm status
pvesh get /cluster/resources --type node

2. Shared VM storage

The storage ceph-vm must exist and be visible on all three nodes:

pvesm status

Expected result: ceph-vm appears as active storage.

3. Ubuntu 24.04 template

Create a Proxmox template named:

ubuntu-24-template

with VMID:

The template should already contain:

user sont
passwordless sudo for sont
qemu-guest-agent
cloud-init
cloud-guest-utils
serial console support
cloud-init drive
network device on vmbr0

Verify:

qm config 9000 | egrep 'name|template|agent|ciuser|ide2|net0|scsi0|boot|serial0'

Expected important lines:

name: ubuntu-24-template
template: 1
agent: enabled=1,...
ciuser: sont
ide2: ceph-vm:...cloudinit...
net0: virtio=...,bridge=vmbr0,...
scsi0: ceph-vm:vm-9000-disk-0,...
boot: order=scsi0
serial0: socket

4. Nested virtualisation

OpenStack compute VMs need nested KVM. Enable nested virtualisation on all Proxmox hosts.

For Intel hosts:

cat >/etc/modprobe.d/kvm-intel.conf <<'NESTED_EOF'
options kvm-intel nested=1
NESTED_EOF
reboot

After reboot:

cat /sys/module/kvm_intel/parameters/nested

Expected:

5. Terraform API user

Run on a Proxmox node as root:

./automation_user.sh

The token secret is printed once. Put it in terraform.tfvars as:

proxmox_api_token = "terraform@pve!openstack=TOKEN_SECRET_HERE"

Quick start

cp terraform.tfvars.example terraform.tfvars
vi terraform.tfvars
terraform init
terraform fmt
terraform validate
terraform plan
terraform apply

Important configuration values

In terraform.tfvars:

proxmox_endpoint  = "https://pve0:8006/"
proxmox_api_token = "terraform@pve!openstack=REPLACE_WITH_TOKEN_SECRET"

vm_user                      = "sont"
ansible_ssh_private_key_file = "~/.ssh/id_rsa"

template_vm_id          = 9000
template_node           = "pve0"
datastore_id            = "ceph-vm"
cloud_init_datastore_id = "ceph-vm"
bridge                  = "vmbr0"
gateway                 = "192.168.1.1"
dns_servers             = ["192.168.1.1", "1.1.1.1"]
gpu_mapping_name        = "pve0-gpu"

The vm_user value is used by cloud-init and in the generated Kolla-Ansible inventory. It should match the user baked into the template.

GPU passthrough

The gpu VM is placed on pve0 because that is the GPU-capable Proxmox host.

By default, GPU passthrough is disabled:

gpu_passthrough = false

Keep it disabled for the initial Kolla-Ansible OpenStack deployment.

Only enable it later after the base cloud is healthy:

gpu_passthrough = true

Before enabling it, create a Proxmox Datacenter PCI resource mapping named:

pve0-gpu

Otherwise the VM will fail to start with:

PCI device mapping not found for 'pve0-gpu'

Useful outputs

After terraform apply, show the generated inventory:

terraform output kolla_inventory

Show /etc/hosts entries:

terraform output hosts_entries

Show SSH checks:

terraform output ssh_test_commands

Validation after apply

Check Proxmox placement:

pvesh get /cluster/resources --type vm | egrep '1210|1211|1212'

Expected:

1210 ctrl pve1 running
1211 cmp  pve2 running
1212 gpu  pve0 running

Check SSH:

ssh sont@192.168.1.51 hostname
ssh sont@192.168.1.52 hostname
ssh sont@192.168.1.53 hostname

Check passwordless sudo:

ssh sont@192.168.1.51 'sudo whoami'
ssh sont@192.168.1.52 'sudo whoami'
ssh sont@192.168.1.53 'sudo whoami'

Expected:

root

Check QEMU guest agent from the correct Proxmox hosts:

ssh root@pve1 'qm agent 1210 ping'
ssh root@pve2 'qm agent 1211 ping'
ssh root@pve0 'qm agent 1212 ping'

Expected:

successfully pinged guest agent

Check nested KVM on compute VMs:

ssh sont@192.168.1.52 "egrep -c '(vmx|svm)' /proc/cpuinfo"
ssh sont@192.168.1.53 "egrep -c '(vmx|svm)' /proc/cpuinfo"

Expected result: a number greater than 0.

Then install and run kvm-ok:

ssh sont@192.168.1.52 'sudo apt update && sudo apt install -y cpu-checker && kvm-ok'
ssh sont@192.168.1.53 'sudo apt update && sudo apt install -y cpu-checker && kvm-ok'

Expected:

KVM acceleration can be used

Destroy / rebuild

To destroy only these OpenStack lab VMs:

terraform destroy \
  -target='proxmox_virtual_environment_vm.openstack["ctrl"]' \
  -target='proxmox_virtual_environment_vm.openstack["cmp"]' \
  -target='proxmox_virtual_environment_vm.openstack["gpu"]'

If Proxmox refresh hangs, use:

terraform destroy \
  -refresh=false \
  -target='proxmox_virtual_environment_vm.openstack["ctrl"]' \
  -target='proxmox_virtual_environment_vm.openstack["cmp"]' \
  -target='proxmox_virtual_environment_vm.openstack["gpu"]'

Common issues

QEMU guest agent timeout

Terraform may warn that it is waiting for network interfaces from the QEMU agent.

Usually this means the VM booted, but the guest agent has not responded yet.

Check that the template has qemu-guest-agent installed and enabled.

VM created on wrong storage

Check:

pvesm list ceph-vm | egrep '1210|1211|1212'
pvesm list local-lvm | egrep '1210|1211|1212'

All lab disks should be on ceph-vm.

GPU VM fails to start

If you see:

PCI device mapping not found for 'pve0-gpu'

Either:

set gpu_passthrough = false, or
create the pve0-gpu PCI mapping first.

OpenStack Proxmox Terraform Actions

This explains what the Terraform files do when creating the three Ubuntu 24.04 VMs for the OpenStack/Kolla-Ansible lab on the Proxmox cluster.

1. High-level purpose

The Terraform project creates this OpenStack lab foundation:

Terraform key	VM name	VMID	Proxmox node	Role	IP address	Purpose
`ctrl`	`ctrl`	`1210`	`pve1`	`control`	`192.168.1.51/24`	OpenStack control plane, API, network, storage, and monitoring services
`cmp`	`cmp`	`1211`	`pve2`	`compute`	`192.168.1.52/24`	Nova compute node
`gpu`	`gpu`	`1212`	`pve0`	`compute-gpu`	`192.168.1.53/24`	Nova compute node reserved for future GPU passthrough work

The gpu VM is intentionally pinned to pve0 because pve0 is the Proxmox host with the GPU.

2. Files and responsibilities

File	What it does
`versions.tf`	Requires Terraform `>= 1.6.0` and the `bpg/proxmox` provider `~> 0.87`.
`provider.tf`	Connects Terraform to Proxmox using the API endpoint and API token.
`variables.tf`	Defines all configurable values: API access, template, storage, networking, SSH user, GPU mapping, and VM map.
`main.tf`	Creates the actual Proxmox VMs.
`outputs.tf`	Prints VM summary, `/etc/hosts` entries, SSH commands, and a Kolla-Ansible inventory.
`terraform.tfvars`	Provides the concrete values for this lab.
`automation_user.sh`	Creates the Proxmox `terraform@pve` API user and `openstack` token.

3. Terraform command flow

3.1 `terraform init`

Command:

terraform init

Actions taken:

Reads versions.tf.
Checks that the local Terraform CLI version is >= 1.6.0.
Downloads the Proxmox provider from bpg/proxmox.
Installs a compatible provider version matching ~> 0.87.
Creates the local .terraform/ provider/plugin directory.
Creates or updates .terraform.lock.hcl with the selected provider checksums.

No VMs are created during terraform init.

3.2 `terraform fmt`

Command:

terraform fmt

Actions taken:

Rewrites .tf files into canonical Terraform formatting.
Aligns indentation and block layout.
Does not contact Proxmox.
Does not create or modify VMs.

3.3 `terraform validate`

Command:

terraform validate

Actions taken:

Checks that the Terraform syntax is valid.
Checks that variables, references, resource names, and output references are valid.
Confirms that provider schema usage is structurally valid.
Does not contact Proxmox for live changes.
Does not create or modify VMs.

3.4 `terraform plan`

Command:

terraform plan

Actions taken:

Reads all configuration files.
Reads terraform.tfvars.
Connects to Proxmox using the configured API endpoint and API token.
Refreshes existing state if VMs already exist.
Compares desired state against actual Proxmox state.
Shows which VMs will be created, changed, or destroyed.

The expected first-run plan is to create three resources:

proxmox_virtual_environment_vm.openstack["ctrl"]
proxmox_virtual_environment_vm.openstack["cmp"]
proxmox_virtual_environment_vm.openstack["gpu"]

No VM changes are made until terraform apply is run.

3.5 `terraform apply`

Command:

terraform apply

Actions taken:

Shows the execution plan.
Waits for confirmation unless run with -auto-approve.
Creates or updates Proxmox VM resources.
Writes the resulting resource IDs into Terraform state.
Prints outputs from outputs.tf.

For this project, terraform apply creates three Proxmox VMs from the var.vms map.

4. Provider configuration actions

The provider connects Terraform to your Proxmox cluster.

Terraform therefore:

Connects to the Proxmox API endpoint.
Authenticates with the terraform@pve!openstack API token.
Accepts the self-signed lab certificate when proxmox_insecure = true.
Uses SSH agent support with user root for provider operations that require SSH.

The provider configuration is conceptually:

provider "proxmox" {
  endpoint  = var.proxmox_endpoint
  api_token = var.proxmox_api_token
  insecure  = var.proxmox_insecure

  ssh {
    agent    = true
    username = var.proxmox_ssh_user
  }
}

5. VM creation loop

The core resource is:

resource "proxmox_virtual_environment_vm" "openstack" {
  for_each = var.vms
  ...
}

This means Terraform does not define three separate VM resources manually. Instead, it loops through the vms map and creates one VM per map entry.

Current default entries:

ctrl
cmp
gpu

The resource addresses are therefore:

proxmox_virtual_environment_vm.openstack["ctrl"]
proxmox_virtual_environment_vm.openstack["cmp"]
proxmox_virtual_environment_vm.openstack["gpu"]

6. Actions taken for each VM

For every VM in var.vms, Terraform performs the following actions.

6.1 Sets name, description, and tags

Terraform sets the VM name from the map key and creates a description from the OpenStack role.

Result:

VM	Description	Tags
`ctrl`	OpenStack lab control VM managed by Terraform	`terraform`, `openstack`, `kolla`, `ubuntu-24-04`, `control`
`cmp`	OpenStack lab compute VM managed by Terraform	`terraform`, `openstack`, `kolla`, `ubuntu-24-04`, `compute`
`gpu`	OpenStack lab compute-gpu VM managed by Terraform	`terraform`, `openstack`, `kolla`, `ubuntu-24-04`, `compute-gpu`

Terraform also ignores future tag drift, so manual tag changes in the Proxmox UI do not cause repeated Terraform updates.

6.2 Places each VM on the correct Proxmox node

Terraform uses each VM’s configured node and VMID.

Result:

VM	VMID	Node
`ctrl`	`1210`	`pve1`
`cmp`	`1211`	`pve2`
`gpu`	`1212`	`pve0`

This fixes the earlier issue where VMs appeared to be created on pve0 and then moved. The desired final placement is explicit per VM.

6.3 Starts VMs and enables boot-on-host-start

Terraform configures each VM to:

Start after creation.
Start automatically when its Proxmox host boots.
Stop cleanly before destroy when Terraform destroys it.

Equivalent behaviour:

started         = true
on_boot         = true
stop_on_destroy = true

6.4 Uses Q35 and OVMF / UEFI

Terraform creates modern VM hardware using:

machine = "q35"
bios    = "ovmf"

Actions:

Creates modern Q35 machine type VMs.
Uses OVMF/UEFI firmware rather than legacy SeaBIOS.
Creates an EFI disk.
Stores the EFI disk on ceph-vm.
Disables Secure Boot pre-enrolled keys.

The EFI disk is configured conceptually as:

efi_disk {
  datastore_id      = var.datastore_id
  type              = "4m"
  pre_enrolled_keys = false
}

6.5 Clones from the Ubuntu template

Terraform clones from the Ubuntu 24.04 template.

Default values:

template_vm_id = 9000
template_node  = "pve0"
datastore_id   = "ceph-vm"

Actions:

Finds template VM 9000 on pve0.
Performs a full clone.
Places the cloned disk on ceph-vm.
Retries clone operations up to three times if Proxmox has a transient failure.

Important requirement: template 9000 must be a working Ubuntu 24.04 cloud-init template.

6.6 Enables QEMU guest agent

Terraform enables Proxmox-side QEMU guest agent support:

agent {
  enabled = true
  trim    = true
}

Actions:

Enables Proxmox-side QEMU guest agent support.
Enables guest disk trim support.

The Ubuntu template must also have qemu-guest-agent installed and enabled inside the guest. Terraform only enables the Proxmox VM setting; it cannot make a missing guest package respond.

Validation commands:

ssh root@pve1 'qm agent 1210 ping'
ssh root@pve2 'qm agent 1211 ping'
ssh root@pve0 'qm agent 1212 ping'

6.7 Configures CPU for nested virtualisation

Terraform sets CPU type to host and enables NUMA awareness.

Actions:

Allocates the requested number of vCPU cores.
Exposes host CPU features to the guest.
Enables NUMA awareness.

Using CPU type host is important because OpenStack compute nodes need nested KVM support inside the VM.

Result:

VM	Cores
`ctrl`	8
`cmp`	16
`gpu`	8

Validation:

ssh sont@192.168.1.52 "egrep -c '(vmx|svm)' /proc/cpuinfo"
ssh sont@192.168.1.53 "egrep -c '(vmx|svm)' /proc/cpuinfo"

6.8 Configures memory

Terraform allocates both dedicated and floating/balloon memory.

Result:

VM	Memory
`ctrl`	16 GB
`cmp`	64 GB
`gpu`	32 GB

6.9 Configures SCSI controller

Terraform uses:

scsi_hardware = "virtio-scsi-single"

Actions:

Uses a VirtIO SCSI controller.
Matches a common performant Proxmox Linux VM layout.

6.10 Creates the OS disk

Terraform creates the main OS disk as scsi0.

Actions:

Creates scsi0 for each VM.
Stores the disk on ceph-vm.
Resizes it to 100 GB.
Uses raw format.
Enables discard/TRIM.
Enables I/O thread.
Marks the disk as SSD.

Result:

VM	Disk	Storage
`ctrl`	100 GB	`ceph-vm`
`cmp`	100 GB	`ceph-vm`
`gpu`	100 GB	`ceph-vm`

Validation:

pvesm list ceph-vm | egrep '1210|1211|1212'
pvesm list local-lvm | egrep '1210|1211|1212'

Expected: disks appear on ceph-vm, not local-lvm.

6.11 Configures Cloud-Init

Terraform creates and configures the cloud-init data for each VM.

Actions:

Creates a cloud-init disk on ceph-vm.
Injects the static IPv4 address for each VM.
Injects the default gateway.
Injects DNS servers.
Creates/configures the cloud-init user from var.vm_user.
Injects the SSH public key into that user.

The fixed issue here is important: the previous version hardcoded the user as ubuntu. The corrected configuration uses:

username = var.vm_user

With:

vm_user = "sont"

Result:

VM	IP	User
`ctrl`	`192.168.1.51/24`	`sont`
`cmp`	`192.168.1.52/24`	`sont`
`gpu`	`192.168.1.53/24`	`sont`

6.12 Adds network device

Terraform adds one NIC to each VM.

Actions:

Adds one NIC to each VM.
Connects it to vmbr0.
Uses VirtIO model.

Result:

net0: virtio=...,bridge=vmbr0

6.13 Sets Linux OS type

Terraform marks the guest OS as a modern Linux 2.6+ kernel family in Proxmox:

operating_system {
  type = "l26"
}

6.14 Adds serial console

Terraform adds a serial device to support cloud-image console access.

This matches the Ubuntu cloud image template style using serial console output.

6.15 Optional GPU passthrough

Terraform uses a conditional dynamic block for GPU passthrough.

Behaviour:

If gpu_passthrough = false, no PCI device is attached.
If gpu_passthrough = true, Terraform adds a PCI device using the Proxmox Datacenter resource mapping named by var.gpu_mapping_name.

Default:

gpu_passthrough = false

Recommended for the initial OpenStack deployment: keep it false.

If enabling later, create this mapping first:

pve0-gpu

Otherwise VM 1212 will fail to start.

7. Outputs generated after apply

7.1 `vm_summary`

Prints a structured object showing VM ID, node, IP, role, CPU, memory, and disk.

Example:

terraform output vm_summary

7.2 `hosts_entries`

Prints static host entries:

192.168.1.51 ctrl
192.168.1.52 cmp
192.168.1.53 gpu

Use them in /etc/hosts on the deployment VM and optionally on all OpenStack nodes.

7.3 `kolla_inventory`

Generates a Kolla-Ansible inventory.

It places:

ctrl in [control], [network], [storage], and [monitoring]
cmp and gpu in [compute]
localhost in [deployment]

Example usage:

terraform output -raw kolla_inventory > multinode

7.4 `ssh_test_commands`

Prints SSH validation commands for all VMs:

terraform output ssh_test_commands

8. Corrected issues

The fixed Terraform set addresses these issues:

Issue	Old behaviour	Fixed behaviour
Cloud-init user mismatch	`username = "ubuntu"`	`username = var.vm_user`
Default VM user	Ubuntu user assumed	`vm_user = "sont"`
README name mismatch	`controller01`, `compute01`, `compute02`	`ctrl`, `cmp`, `gpu`
Static inventory hardcoding	Fixed inventory text	Dynamic output from `var.vms`
GPU safety	Could attach GPU prematurely	Disabled by default and conditional
Storage clarity	Could drift to local storage	OS, EFI, clone, and cloud-init storage default to `ceph-vm`
Automation script safety	Could fail if user exists	User creation made safer/idempotent

9. Expected final Proxmox result

After terraform apply, the Proxmox UI should show:

pve0
  1212 gpu

pve1
  1210 ctrl

pve2
  1211 cmp

All three should be running and using ceph-vm for their disks.

10. Validation commands

Check VM placement and state:

pvesh get /cluster/resources --type vm | egrep '1210|1211|1212'

Check storage:

pvesm list ceph-vm | egrep '1210|1211|1212'
pvesm list local-lvm | egrep '1210|1211|1212'

Check SSH:

ssh sont@192.168.1.51 hostname
ssh sont@192.168.1.52 hostname
ssh sont@192.168.1.53 hostname

Check sudo:

ssh sont@192.168.1.51 'sudo whoami'
ssh sont@192.168.1.52 'sudo whoami'
ssh sont@192.168.1.53 'sudo whoami'

Check QEMU guest agent:

ssh root@pve1 'qm agent 1210 ping'
ssh root@pve2 'qm agent 1211 ping'
ssh root@pve0 'qm agent 1212 ping'

Check nested KVM:

ssh sont@192.168.1.52 "egrep -c '(vmx|svm)' /proc/cpuinfo"
ssh sont@192.168.1.53 "egrep -c '(vmx|svm)' /proc/cpuinfo"

11. Next step after successful VM creation

Once these three VMs are healthy, continue with the OpenStack/Kolla-Ansible preparation steps:

Add /etc/hosts entries.
Confirm SSH access from the deployment VM.
Install Docker and Python dependencies on all OpenStack nodes.
Install Kolla-Ansible on the deployment VM.
Generate and edit Kolla inventory.
Run kolla-ansible bootstrap-servers.
Run kolla-ansible prechecks.
Run kolla-ansible deploy.

Overview

You now have two playbooks covering the next OpenStack preparation stages:

prepare_openstack_vms.yml
    └── Phase 7.2 - Prepare OS
    └── Phase 7.3 - Install Docker

build_deployment_vm.yml
    └── Phase 7.4 - Build Kolla-Ansible deployment node

Your inventory defines the three OpenStack VMs as:

ctrl  192.168.1.51  control
cmp   192.168.1.52  compute
gpu   192.168.1.53  compute-gpu

and uses sont as the Ansible SSH user with privilege escalation enabled.

Phase 7.2 — Prepare the OpenStack VMs

This section runs against:

hosts: openstack

So it applies to all three OpenStack VMs:

ctrl
cmp
gpu

The playbook is preparing the operating system so that the nodes are suitable for Kolla-Ansible, Docker containers, OpenStack services, and later Nova/Cinder/Neutron workloads.

1. Defines the OpenStack host map

The playbook defines this internal map:

openstack_hosts:
  ctrl:
    ip: "192.168.1.51"
    fqdn: "ctrl.lab.local"
    role: "control"
  cmp:
    ip: "192.168.1.52"
    fqdn: "cmp.lab.local"
    role: "compute"
  gpu:
    ip: "192.168.1.53"
    fqdn: "gpu.lab.local"
    role: "compute-gpu"

This lets the playbook understand the expected hostname, IP, FQDN, and role of each VM.

It also has a safety check:

- name: Validate host exists in openstack_hosts map

That means if you accidentally run the play against a host not listed in the map, it fails early instead of modifying the wrong machine.

This is good practice.

2. Sets the system hostname

For each VM, it sets the Linux hostname to match the Ansible inventory name:

ctrl
cmp
gpu

So:

hostname:
  name: "{{ inventory_hostname }}"

This is important because OpenStack/Kolla-Ansible depends heavily on predictable hostnames. If hostname resolution is wrong, later steps such as RabbitMQ clustering, MariaDB, Nova compute registration, and container service discovery can fail in confusing ways.

3. Updates `/etc/hosts`

The playbook manages two sections in /etc/hosts.

First, it sets localhost identity:

127.0.0.1 localhost
127.0.1.1 ctrl ctrl.lab.local

or equivalent for each node.

Second, it adds all three OpenStack nodes:

192.168.1.51 ctrl ctrl.lab.local
192.168.1.52 cmp cmp.lab.local
192.168.1.53 gpu gpu.lab.local

This means every node can resolve every other node even if DNS is not yet configured.

That matters because Kolla-Ansible and OpenStack services expect reliable name resolution between:

ctrl <-> cmp
ctrl <-> gpu
cmp  <-> gpu

4. Updates and upgrades the OS

The playbook refreshes the apt cache:

apt:
  update_cache: true
  cache_valid_time: 3600

Then performs a full distribution upgrade:

apt:
  upgrade: dist
  autoremove: true
  autoclean: true

This brings the base Ubuntu VMs up to date before installing Docker and Kolla dependencies.

That is especially important before deploying OpenStack, because you want predictable package versions, current kernel fixes, and clean dependency resolution before containers are deployed.

5. Installs base operating system packages

The playbook installs a useful base package set:

vim
curl
wget
git
jq
htop
tmux
python3
python3-pip
python3-venv
python3-setuptools
python3-dev
chrony
net-tools
bridge-utils
lvm2
ca-certificates
gnupg
apt-transport-https
software-properties-common
rsync
unzip
iproute2
iputils-ping
traceroute
dnsutils
tcpdump
open-iscsi

These cover several categories:

Administration:      vim, htop, tmux, jq
Python/Ansible:      python3, pip, venv, setuptools, dev headers
Networking:          iproute2, ping, traceroute, dnsutils, tcpdump
Storage:             lvm2, open-iscsi
Package/security:    ca-certificates, gnupg, apt-transport-https

The important OpenStack-related packages here are:

python3
python3-pip
python3-venv
python3-dev
lvm2
open-iscsi
bridge-utils
tcpdump

open-iscsi is particularly useful later for block storage integration and general cloud storage testing.

6. Enables time synchronisation with Chrony

The playbook enables and starts chrony:

systemd:
  name: chrony
  enabled: true
  state: started

This is critical.

OpenStack services are sensitive to clock drift. Keystone tokens, certificates, logs, distributed service coordination, RabbitMQ behaviour, and database timestamps all depend on consistent time.

Bad time sync can cause errors that look like authentication or API failures.

7. Enables open-iscsi

The playbook enables and starts:

iscsid

This supports iSCSI-based block storage usage. Even if your primary storage will be Ceph RBD, enabling iSCSI now is sensible because OpenStack/Cinder labs often test multiple storage backends.

The task uses:

ignore_errors: true

So if the service name or package behaviour differs slightly, it does not stop the entire run.

8. Disables swap

The playbook disables swap immediately:

swapoff -a

Then comments out swap entries in /etc/fstab.

This matters for two reasons:

Kubernetes later expects swap to be disabled unless specifically configured otherwise.
OpenStack compute/container workloads behave more predictably without the kernel swapping under memory pressure.

For your combined OpenStack/Kubernetes/Slurm homelab, disabling swap is the right default.

9. Loads kernel modules for container and bridge networking

The playbook creates:

/etc/modules-load.d/openstack-homelab.conf

with:

overlay
br_netfilter

Then it immediately loads both modules.

overlay is needed for Docker’s overlay2 storage driver.

br_netfilter allows iptables/nftables rules to inspect traffic crossing Linux bridges. That is important for container networking and Kubernetes-style bridge traffic handling.

10. Applies sysctl settings for OpenStack/container networking

The playbook writes:

/etc/sysctl.d/99-openstack-homelab.conf

with:

net.ipv4.ip_forward = 1
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.conf.all.rp_filter = 0
net.ipv4.conf.default.rp_filter = 0

Then it runs:

sysctl --system

These settings prepare the nodes for:

routing
bridge traffic filtering
container networking
OpenStack Neutron traffic
overlay networking
future Kubernetes networking

The rp_filter = 0 settings are especially relevant for multi-interface, overlay, and routed cloud networking. Strict reverse-path filtering can break asymmetric routing and overlay traffic.

Phase 7.3 — Install Docker

Kolla-Ansible deploys OpenStack services as containers, so Docker must be installed and running on all OpenStack nodes.

This section also runs against:

ctrl
cmp
gpu

The Docker section installs Docker, configures daemon settings, enables the service, verifies it, and reboots if the OS upgrade requires it.

1. Installs Docker packages

The playbook installs:

docker.io
python3-docker

docker.io provides the Docker engine.

python3-docker allows Ansible and Python tooling to interact with Docker if needed.

2. Creates Docker systemd override directory

It creates:

/etc/systemd/system/docker.service.d

This is not yet populated with an override file, but it prepares the node for future Docker service overrides.

That is useful if later you need to set proxy settings, custom daemon startup flags, or systemd limits.

3. Configures Docker daemon

The playbook creates:

/etc/docker/daemon.json

with:

{
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "100m",
    "max-file": "5"
  },
  "storage-driver": "overlay2"
}

This does two important things.

First, it prevents Docker container logs from growing without bounds:

max-size = 100m
max-file = 5

So each container can use up to roughly 500 MB of JSON logs before rotation.

Second, it forces Docker to use:

overlay2

which is the standard storage driver for modern Linux container hosts.

The task notifies the Docker restart handler, so Docker restarts only if the config changes.

4. Enables and starts Docker

The playbook ensures Docker is:

enabled at boot
currently running

using systemd.

This means the node is ready for Kolla-Ansible to deploy OpenStack service containers.

5. Adds Ansible user to the Docker group

The playbook adds the Ansible SSH user to:

docker

In your inventory this user is:

sont

This allows sont to run Docker commands without needing sudo, although Kolla-Ansible itself will still commonly use privilege escalation for deployment tasks.

6. Verifies Docker

It runs:

docker --version
docker ps

Then prints the Docker version.

This confirms:

Docker binary is installed
Docker daemon is running
Current Ansible privilege context can query Docker

7. Reboots if required

The playbook checks:

/var/run/reboot-required

If present, it reboots the node.

This is important because the earlier full OS upgrade may have installed:

new kernel
system libraries
security updates
container runtime dependencies

You want those active before Kolla-Ansible deploys OpenStack containers.

8. Performs final verification

At the end, the playbook collects:

hostnamectl --static
ip -br addr
chronyc tracking

and prints a final summary containing:

Host
Role
Hostname
Docker version
IP summary
Chrony tracking

That gives you a clear post-run confirmation for each VM.

Phase 7.4 — Build the Kolla-Ansible Deployment Node

Phase 7.4 is handled by build_deployment_vm.yml.

This playbook runs primarily on:

hosts: control

In your inventory, that means:

ctrl

The purpose is to turn ctrl into the Kolla-Ansible deployment host. In your topology, ctrl becomes:

OpenStack control node
network node
monitoring node
Kolla-Ansible deployment node

The playbook explicitly asserts that it is running only on ctrl, then installs the tools and files needed to run Kolla-Ansible.

1. Defines Kolla paths

It sets these important paths:

Kolla virtualenv:     /opt/kolla-venv
Kolla config:         /etc/kolla
Kolla work dir:       /opt/kolla-ansible
Kolla inventory dir:  /opt/kolla-ansible/inventory
Kolla SSH key:        /home/sont/.ssh/id_ed25519_kolla

This gives you a clean separation:

/opt/kolla-venv       Python/Kolla tooling
/etc/kolla            Kolla configuration
/opt/kolla-ansible    working directory and inventories
/home/sont/.ssh       deployment SSH identity

That is a sensible layout for a homelab because it avoids scattering Kolla files across the user’s home directory.

2. Installs deployment-node packages

It installs build and Python dependencies such as:

build-essential
gcc
libffi-dev
libssl-dev
python3
python3-dev
python3-pip
python3-venv
python3-setuptools
python3-wheel
git
rsync
sshpass

These are needed because Kolla-Ansible and the OpenStack client are Python packages. Some dependencies may need compilation or development headers.

3. Creates a Python virtual environment

It creates:

/opt/kolla-venv

using:

python3 -m venv /opt/kolla-venv

This isolates Kolla-Ansible and OpenStack client dependencies from the operating system Python packages.

That is the right approach. It prevents system package conflicts and makes the deployment tooling easier to rebuild.

4. Upgrades pip tooling inside the virtualenv

It upgrades:

pip
setuptools
wheel

inside /opt/kolla-venv.

That reduces Python packaging problems when installing Kolla-Ansible, OpenStack client libraries, and their dependencies.

5. Installs Ansible, Kolla-Ansible and OpenStack client

The current uploaded build_deployment_vm.yml installs:

ansible-core
ansible
kolla-ansible
python-openstackclient

inside /opt/kolla-venv.

That gives ctrl the core tooling needed for later phases:

kolla-ansible
openstack
ansible
ansible-playbook
ansible-galaxy

One important note: as seen from your earlier error, this section may need version pinning depending on the Kolla-Ansible version installed. The uploaded file still shows unpinned ansible-core and ansible, so it can accidentally install an incompatible Ansible version.

6. Creates `/etc/kolla`

The playbook creates:

/etc/kolla

owned by the deployment user.

This directory will hold:

globals.yml
passwords.yml
future service config overrides

This becomes the main Kolla-Ansible configuration directory.

7. Creates Kolla working directories

It creates:

/opt/kolla-ansible
/opt/kolla-ansible/inventory
/opt/kolla-ansible/logs

This gives you a consistent place to store:

Kolla inventory files
deployment helper scripts
logs
homelab starter inventory

8. Finds Kolla example configuration files

Instead of assuming a fixed install path, the playbook searches for:

globals.yml

under:

/opt/kolla-venv/share
/usr/local/share
/usr/share

If it cannot find globals.yml, it fails with a debugging command.

This is an improvement over the earlier cp ... * approach because Python virtualenv installs can place Kolla example files in slightly different locations depending on package/version.

9. Copies `globals.yml` and `passwords.yml`

Once it finds the example config directory, it copies:

globals.yml
passwords.yml

into:

/etc/kolla

using remote_src: true.

It uses:

force: false

That means it does not overwrite existing files.

That is good because once you start editing /etc/kolla/globals.yml, rerunning the playbook will not destroy your changes.

10. Finds Kolla inventory templates

The playbook searches for the official Kolla:

multinode

inventory file.

Then it sets the source inventory directory and copies:

all-in-one
multinode

to:

/opt/kolla-ansible/inventory

These are the official Kolla inventory templates.

You will later edit multinode for Phase 7.5.

11. Creates a homelab starter Kolla inventory

The playbook creates:

/opt/kolla-ansible/inventory/multinode-homelab

with this simplified topology:

[control]
ctrl

[network]
ctrl

[compute]
cmp
gpu

[monitoring]
ctrl

[storage]
ctrl
cmp
gpu

[deployment]
localhost

This file is not yet the final production Kolla inventory. It is a starter reference for your homelab layout.

The key mapping is:

ctrl = control + network + monitoring
cmp  = compute
gpu  = compute, later GPU passthrough

This matches your intended OpenStack topology.

12. Creates a Kolla environment helper file

It creates:

/opt/kolla-ansible/kolla-env.sh

containing:

export PATH="/opt/kolla-venv/bin:$PATH"
export KOLLA_CONFIG_PATH="/etc/kolla"
export KOLLA_INVENTORY="/opt/kolla-ansible/inventory/multinode"

alias ka="/opt/kolla-venv/bin/kolla-ansible"
alias osc="/opt/kolla-venv/bin/openstack"

This is useful because after logging into ctrl, you can run:

source /opt/kolla-ansible/kolla-env.sh

Then use:

ka
osc

instead of typing the full paths.

13. Installs Kolla-Ansible Galaxy dependencies

The playbook runs:

/opt/kolla-venv/bin/kolla-ansible install-deps

This installs the Ansible roles and collections required by Kolla-Ansible.

This is the task you have been troubleshooting.

Conceptually, it prepares the deployment node so that later commands can run:

kolla-ansible bootstrap-servers
kolla-ansible prechecks
kolla-ansible deploy

However, your current version has hit a dependency issue around the stable/2024.2 collection branch. So as written, this stage is intended to complete 7.4, but in your actual run it needs the version/requirements patch we discussed earlier.

14. Generates a dedicated Kolla SSH key

The playbook creates:

/home/sont/.ssh/id_ed25519_kolla

with comment:

kolla-deploy@ctrl

This gives ctrl its own deployment SSH identity.

That is better than reusing your personal SSH key because this key is specifically for:

ctrl -> ctrl
ctrl -> cmp
ctrl -> gpu

Kolla-Ansible needs this because it will SSH from the deployment node to all OpenStack nodes.

15. Reads and stores the public key

It reads:

/home/sont/.ssh/id_ed25519_kolla.pub

using slurp, decodes it, and stores it as:

kolla_deploy_public_key_text

That lets the later play distribute the key to every OpenStack node.

16. Verifies tool versions

The playbook checks:

/opt/kolla-venv/bin/kolla-ansible --version
/opt/kolla-venv/bin/openstack --version

Then prints a deployment node summary showing:

Deployment node
Kolla venv path
Kolla config path
Kolla work dir
Kolla inventory dir
Kolla-Ansible version
OpenStack client version

Phase 7.4 — Authorise SSH from `ctrl` to all OpenStack nodes

The second play in build_deployment_vm.yml runs against:

hosts: openstack

So it applies to:

ctrl
cmp
gpu

It installs the public key generated on ctrl into each node’s authorized_keys.

That means the deployment user on ctrl can SSH to all OpenStack nodes using:

/home/sont/.ssh/id_ed25519_kolla

This is necessary because Kolla-Ansible runs from the deployment node and performs remote actions across all nodes.

Phase 7.4 — Verify SSH from `ctrl`

The final play runs from:

hosts: control

So it runs on ctrl.

It executes:

ssh -i /home/sont/.ssh/id_ed25519_kolla \
  -o BatchMode=yes \
  -o StrictHostKeyChecking=accept-new \
  sont@192.168.1.51 hostname

ssh -i /home/sont/.ssh/id_ed25519_kolla \
  -o BatchMode=yes \
  -o StrictHostKeyChecking=accept-new \
  sont@192.168.1.52 hostname

ssh -i /home/sont/.ssh/id_ed25519_kolla \
  -o BatchMode=yes \
  -o StrictHostKeyChecking=accept-new \
  sont@192.168.1.53 hostname

Expected output:

192.168.1.51 returned hostname ctrl
192.168.1.52 returned hostname cmp
192.168.1.53 returned hostname gpu

This proves that Kolla-Ansible will be able to reach all nodes from the deployment node.

What is complete after these playbooks?

Once both playbooks complete successfully, you should have:

Phase 7.2 complete:
  ctrl/cmp/gpu hostnames set
  /etc/hosts configured
  OS updated
  base packages installed
  chrony running
  open-iscsi enabled
  swap disabled
  kernel modules loaded
  sysctl networking prepared

Phase 7.3 complete:
  Docker installed
  Docker daemon configured
  Docker enabled and running
  Docker log rotation configured
  Docker overlay2 storage configured
  Ansible user added to docker group
  Docker verified

Phase 7.4 complete:
  ctrl prepared as Kolla deployment node
  /opt/kolla-venv created
  kolla-ansible installed
  python-openstackclient installed
  /etc/kolla created
  globals.yml and passwords.yml copied
  official Kolla inventories copied
  homelab inventory starter created
  kolla-env.sh helper created
  Galaxy dependencies installed
  dedicated Kolla SSH key generated
  ctrl can SSH to ctrl/cmp/gpu

What these playbooks do not do yet

They do not yet perform Phase 7.5 onwards.

Specifically, they do not yet:

Configure the final Kolla multinode inventory
Configure /etc/kolla/globals.yml
Generate or customise OpenStack passwords
Configure network_interface/api_interface
Configure Neutron provider/tenant networking
Configure Ceph integration
Run kolla-ansible bootstrap-servers
Run kolla-ansible prechecks
Run kolla-ansible deploy
Run kolla-ansible post-deploy
Create OpenStack images, networks, routers, flavors, or VMs

So the playbooks prepare the foundation, but they do not deploy OpenStack yet.

Important correction before you rely on Phase 7.4

Your uploaded build_deployment_vm.yml still contains the unpinned package install:

ansible-core
ansible
kolla-ansible
python-openstackclient

and still runs:

kolla-ansible install-deps

without the compatibility patch for the stable/2024.2 collection issue.

So the intent of the playbook is correct, but for your environment it needs the fixes we discussed:

Pin compatible Ansible versions
Run install-deps with the venv PATH
Patch requirements.yml if it still references stable/2024.2

After that, Phase 7.4 should complete cleanly and you will be ready for:

Phase 7.5 – Configure Kolla multinode inventory
Phase 7.6 – Configure globals.yml
Phase 7.7 – Bootstrap servers
Phase 7.8 – Run prechecks
Phase 7.9 – Deploy OpenStack

Creating Ubuntu 24.04 cloud-init template

1. Install required tools on pve0

2. Download a fresh Ubuntu 24.04 cloud image

3. Create the sont public key file

4. Inject the sont user, passwordless sudo, and QEMU guest agent

5. Create the Proxmox VM shell for the template

6. Import the Ubuntu disk into ceph-vm

7. Attach the imported disk

8. Add Cloud-Init drive

9. Verify the template VM config before booting

10. Start the VM for testing

11. Clean the VM before converting to template

12. Remove the temporary test IP before making it the final template

13. Convert VM 9000 to a template

14. Terraform settings to use this template

15. Quick clone test

Creating the OpenStack Cluster using Terraform

OpenStack Step 1 Terraform for Proxmox VE 9.x

What this creates

Files

Prerequisites

1. Proxmox cluster

2. Shared VM storage

3. Ubuntu 24.04 template

4. Nested virtualisation

5. Terraform API user

Quick start

Important configuration values

GPU passthrough

Useful outputs

Validation after apply

Destroy / rebuild

Common issues

QEMU guest agent timeout

VM created on wrong storage

GPU VM fails to start

OpenStack Proxmox Terraform Actions

1. High-level purpose

2. Files and responsibilities

3. Terraform command flow

3.1 terraform init

3.2 terraform fmt

3.3 terraform validate

3.4 terraform plan

3.5 terraform apply

4. Provider configuration actions

5. VM creation loop

6. Actions taken for each VM

6.1 Sets name, description, and tags

6.2 Places each VM on the correct Proxmox node

6.3 Starts VMs and enables boot-on-host-start

6.4 Uses Q35 and OVMF / UEFI

6.5 Clones from the Ubuntu template

6.6 Enables QEMU guest agent

6.7 Configures CPU for nested virtualisation

6.8 Configures memory

6.9 Configures SCSI controller

6.10 Creates the OS disk

6.11 Configures Cloud-Init

6.12 Adds network device

6.13 Sets Linux OS type

6.14 Adds serial console

6.15 Optional GPU passthrough

7. Outputs generated after apply

7.1 vm_summary

7.2 hosts_entries

7.3 kolla_inventory

7.4 ssh_test_commands

8. Corrected issues

9. Expected final Proxmox result

10. Validation commands

11. Next step after successful VM creation

Overview

Phase 7.2 — Prepare the OpenStack VMs

1. Defines the OpenStack host map

2. Sets the system hostname

3. Updates /etc/hosts

4. Updates and upgrades the OS

5. Installs base operating system packages

6. Enables time synchronisation with Chrony

1. Install required tools on `pve0`

3. Create the `sont` public key file

4. Inject the `sont` user, passwordless sudo, and QEMU guest agent

6. Import the Ubuntu disk into `ceph-vm`

3.1 `terraform init`

3.2 `terraform fmt`

3.3 `terraform validate`

3.4 `terraform plan`

3.5 `terraform apply`

7.1 `vm_summary`

7.2 `hosts_entries`

7.3 `kolla_inventory`

7.4 `ssh_test_commands`

3. Updates `/etc/hosts`

6. Creates `/etc/kolla`

9. Copies `globals.yml` and `passwords.yml`

Phase 7.4 — Authorise SSH from `ctrl` to all OpenStack nodes

Phase 7.4 — Verify SSH from `ctrl`