On this episode we talk with Evan Jones (www.evanjones.ca or Twitter: @epcjones), who works at Datadog and previously worked at Bluecore, Twitter, Mitro (his own startup), Infix, Google, and received a PhD from MIT. We discuss server reliability, learning from a cloud provider outage, the difficulties of scaling systems, how to address failures due to overloading your systems, and much more!