I assume you already read the AWS report[1] about recent troubles. I think it is a very good argument you could use at work against design complexity and in favor of designing stuff that are at a complexity level where analysis of failure modes and prevention is actually possible. [1] https://aws.amazon.com/message/680342/