Hey Everyone,
In this episode I am talking to Mayank Shrivastava who has vast experience into building and maintaining high scale distributed systems. He was in the team that originally built Apache Pinot at Linkedin and is now working at StarTree as the Head of Core Data Engineering.
He has shared some amazing insights from his experience and there is a lot to learn from our discussion.
We discuss about the following:
00:00 Introduction
04:20 Practices to follow while designing and developing Distributed Systems
05:47 What do we mean by Solid Scalable Design? How do we approach that?
09:00 Safety Nets for developing Distributed systems
10:21 When is the right time to do performance benchmarking?
17:00 What is release certification?
21:00 Deploying to Production
24:45 Example when Canary Deployment might not be a good strategy?
26:00 Example when Canary Deployment a good strategy?
27:30 Post Deployment - how do we observe our system?
33:30 How do we avoid on-call(alerting) noise?
42:00 Maintaining a Large scale Distributed system
47:15 Scaling up/down for stateful systems
51:30 Handling Failures in Production (Disaster Recovery)
01:00:30 Runbooks - How do we keep them updated?
References:
The GeekNarrator Linkedin page: https://www.linkedin.com/company/86276626
Kaivalya Apte: https://www.linkedin.com/in/kaivalya-apte-2217221a/
Geeknarrator website: www.geeknarrator.com
Mayank Shrivastava: https://www.linkedin.com/in/mayankshriv/
StarTree: https://www.startree.ai/
Apache Pinot: https://pinot.apache.org/
Hope you enjoy the discussion and learn from it. Please hit the like button if you liked my discussion with Mayank and please subscribe to the channel for more content like this.
Cheers, The GeekNarrator