Goal: turn the unused storage in each Dell T5500 into a shared storage cluster that Proxmox, OpenStack, Kubernetes, and Slurm can later consume.
Target design:
pve01 ─ Ceph MON + MGR + OSD
pve02 ─ Ceph MON + MGR + OSD
pve03 ─ Ceph MON + MGR + OSD
Use Ceph for:
VM disks → Ceph RBD
OpenStack Cinder → Ceph RBD
Glance images → Ceph RBD
Kubernetes PVs → Ceph CSI later
Shared files → CephFS later
Object storage → RGW later
1. Pre-check every node
Run on all three Proxmox nodes:
hostname
ip -br addr
lsblk
pveversion
timedatectl
Check that:
pve01 / pve02 / pve03 can resolve each other
time is synced
cluster quorum is healthy
the intended Ceph disks are unused
Check cluster:
pvecm status
You want:
Quorum: Yes
Nodes: 3
2. Decide the Ceph network
Because you only have one NIC per node, keep it simple first.
Use your existing Proxmox management network initially.
Example:
pve01 192.168.1.10
pve02 192.168.1.11
pve03 192.168.1.12
Later, if you add VLANs or a second NIC, you can separate:
public_network = client / VM / OpenStack access
cluster_network = OSD replication / recovery traffic
For now:
Ceph public network = 192.168.1.0/24
Ceph cluster network = same network
Not ideal for production, but fine for learning.
3. Install Ceph from Proxmox UI or CLI
Option A — Proxmox UI
On each node:
Datacenter
→ Node
→ Ceph
→ Install Ceph
Use the same version on all nodes.
Then initialise Ceph on the first node.
Option B — CLI
On each node:
pveceph install
Then initialise Ceph on pve01:
pveceph init --network 192.168.1.0/24
Then create MONs:
pveceph mon create
Run this on each node:
pveceph mon create
Then create managers:
pveceph mgr create
Again, run on each node.
Verify:
ceph -s
ceph mon stat
ceph mgr stat
4. Prepare disks for OSDs
List disks:
lsblk -o NAME,SIZE,TYPE,FSTYPE,MOUNTPOINT
Example:
sda 150G Proxmox OS
sdb 2.5T empty disk for Ceph OSD
Important: do not use the Proxmox OS disk as an OSD unless you deliberately partitioned it for that.
Wipe the intended Ceph disk:
wipefs -a /dev/sdb
sgdisk --zap-all /dev/sdb
Then create one OSD per node:
pveceph osd create /dev/sdb
Repeat on pve01, pve02, and pve03.
Verify:
ceph osd tree
ceph -s
You want to see:
3 osds: 3 up, 3 in
5. Create a Ceph pool for VM disks
Create an RBD pool:
ceph osd pool create vm-rbd 32
ceph osd pool application enable vm-rbd rbd
rbd pool init vm-rbd
For a small 3-node lab, 32 PGs is fine to start.
Add it to Proxmox storage:
pvesm add rbd ceph-vm-rbd \
--pool vm-rbd \
--content images,rootdir \
--krbd 0
Check:
pvesm status
You should see ceph-vm-rbd.
6. Test VM storage on Ceph
Create or clone a small VM and place its disk on:
ceph-vm-rbd
Then test live migration:
qm migrate <vmid> pve02 --online
If the VM disk is on Ceph, migration should not require copying the disk.
That is one of the main reasons Ceph is valuable.
7. Create CephFS later, not immediately
Do RBD first.
After RBD works, add CephFS.
Create metadata servers:
pveceph mds create
Create CephFS:
pveceph fs create cephfs
Add to Proxmox storage:
pvesm add cephfs cephfs \
--content iso,backup,snippets
Use CephFS for:
ISO images
container templates
backups
shared files
snippets
Use RBD for:
VM disks
OpenStack Cinder volumes
Glance images
8. Learn the important Ceph commands
Run these repeatedly until you understand them:
ceph -s
ceph health detail
ceph osd tree
ceph osd df
ceph df
ceph mon stat
ceph mgr stat
ceph pg stat
ceph pg dump_stuck
For OSD detail:
ceph osd metadata
ceph osd perf
For pools:
ceph osd pool ls detail
rbd ls vm-rbd
rbd du vm-rbd
9. Break it deliberately
This is where the real learning happens.
Safely test:
systemctl stop ceph-osd@0
ceph -s
systemctl start ceph-osd@0
Observe:
HEALTH_WARN
OSD down
PG degraded
recovery starts
HEALTH_OK returns
Then test node-level failure:
Shutdown pve03
Watch ceph -s
Restart pve03
Watch recovery
Do not panic when Ceph shows warnings. Learn what they mean.
10. What healthy looks like
A healthy 3-node lab should show roughly:
cluster:
health: HEALTH_OK
services:
mon: 3 daemons, quorum pve01,pve02,pve03
mgr: pve01(active), standbys: pve02,pve03
osd: 3 osds: 3 up, 3 in
data:
pools: 1 pools
pgs: active+clean
11. Key concepts to understand
Focus on these:
MON = cluster map and quorum
MGR = metrics, dashboard, management modules
OSD = stores the actual data
Pool = logical storage namespace
PG = placement group
CRUSH = decides where data lives
RBD = block devices for VMs
CephFS = shared filesystem
RGW = S3-compatible object storage
The biggest one is CRUSH.
CRUSH determines where replicas are placed. In your 3-node lab, you want data spread across different hosts, not multiple copies on the same host.
Recommended order
Do not try to configure everything at once.
Use this sequence:
1. Install Ceph packages
2. Create MONs
3. Create MGRs
4. Add one OSD per node
5. Confirm HEALTH_OK
6. Create RBD pool
7. Add RBD to Proxmox
8. Put VM disk on Ceph
9. Test live migration
10. Break and recover one OSD
11. Add CephFS
12. Later integrate with OpenStack
For your homelab, the first real success milestone is:
A VM running on pve01 with its disk on Ceph RBD,
live migrated to pve02 without copying the disk.
That proves the Proxmox + Ceph foundation is working.