Managing a large number of nodes

Imagine following setup. As surly mentions somewhen in this blog, we are running a cloud based on OpenStack. The team, I am working in, is responsible to provision new nodes to the cluster, and manage the lifecycle of the existing ones. We also deploy OpenStack and related services to our fleet. One hugh task is running OpenStack high available. This seems not too difficult, but also means, we have to make each component OpenStack depends on, HA as well. So we use Galera as Database, Quobyte as a distributed storage, clustered RabbitMQ, clustered Cassandra, Zookeeper, and things I may have forgotten.

I will write about some aspects of our setup, how we deploy our nodes, and how we keep them up to date.