Nobody builds a system intending to make it unoperatable. It happens gradually. A cluster grows. Another service gets added. Multi-tenancy gets introduced to cut costs.
Each decision is reasonable in isolation. Then one night something breaks, and the engineer on call is staring at dashboards they can’t interpret, running commands th...