Operational resiliency, as explained by Dormain Drewitz of PagerDuty, involves the ability to bounce back and recover from setbacks, not only technically but also in terms of organizational recovery. True resiliency means maintaining the willingness to take risks even after facing challenges. In a conversation with Heather Joslyn on the New Stack Makers podcast, Drewitz discussed the role of AI and automation in achieving operational resiliency, especially in a context where teams are under pressure to be more productive.
Automation, including generative AI code completion tools, is increasingly used to boost developer productivity. However, this may lead to shifting bottlenecks from developers to operations, creating new challenges. Drewitz emphasized the importance of considering the entire value chain and identifying areas where AI and automation can assist. For instance, automating repetitive tasks in incident response, such as checking APIs, closing ports, or database checks, can significantly reduce interruptions and productivity losses.
PagerDuty's AI-powered platform leverages generative AI to automate tasks and create runbooks for incident handling, allowing engineers to focus on resolving root causes and restoring services. This includes drafting status updates and incident postmortem reports, streamlining incident response and saving time. Having an operations platform that can generate draft reports at the push of a button simplifies the process, making it easier to review and edit without starting from scratch.
Learn more from The New Stack about AI, Automation, Incident Response, and PagerDuty:
Operationalizing AI: Accelerating Automation, DataOps, AIOps
Three Ways Automation Can Improve Workplace Culture