Maintenance - Service disruption
Resolved
Feb 22, 2026 at 06:01am UTC
This is a major overhaul of the R730xd server hardware and a complete platform migration for all services:
Hardware Upgrades:
Installing 8x 480GB SSD RAID 10 array in an additional physical RAID card for VM storage
Adding second GPU (enables GPU workload high availability)
Installing 10GbE SFP+ networking (10x faster uplink)
Fixing CPU thermal issues and automatic fan control
Migrating all equipment into an enclosed rack (physical relocation)
Platform Migration:
Moving all services from Docker VMs to a 3-node Kubernetes (k3s) cluster - This part may take months, not weeks to come to fruition
Deploying Longhorn distributed block storage with high availability
GPU nodes will use dedicated NVMe storage for faster caching
Consolidating from 5 separate Docker host VMs to a unified k3s platform
Why the disruptions:
Initial downtime: Hardware installation (CPUs, RAID card, GPUs, thermal paste, cleaning)
Rolling disruptions: Migrating VMs to new storage (zero-downtime Storage vMotion, but services may be slower)
Service redeployment: Moving each service from Docker Compose to Kubernetes (brief outage per service)
Physical relocation: Moving server and network gear into the new rack
End result: Faster performance, better reliability, GPU failover for transcoding, and a much cleaner infrastructure setup.
Affected services