Salesforce outage: Global DNS downfall started by an engineer trying a quick fix

  • > "We're not blaming one employee," said Chief Availability Officer Darryn Dieken

    > "For whatever reason that we don't understand, the employee decided to do a global deployment," Dieken went on. The usual staggered approach was therefore bypassed.

    > And the engineer who sidestepped Salesforce's carefully crafted policies and took down the platform? "We have taken action with that particular employee," said Dieken.

    Holy contradiction, Batman!

  • This looks pretty bad on Salesforce's engineering culture.

    1. They're still using manual processes where automation should be used.

    2. They're using insufficiently robust scripts (Forgivable to a degree. Bugs happen)

    3. They blame the individual rather than the process which allowed the individual to make this mistake.

    4. They have their status page on the same infrastructure that the status page is reporting on.

  • 10:1 some manager or product/salesperson told the engineer to rush it.

  • sure, blame it on 1 person. don’t blame the company with such a messed up infra that 1 person can accidentally bring it all down