Building Resilient IT Teams and Solutions

Episode 12   Published January 2, 2025 13 minute watch

Summary

Resilient IT starts with resilient people, Jeremy Maldonado says, and the creativity to recover from a setback matters more than knowing everything. He draws the point from a decade in IT and earlier years waiting tables and managing restaurants, work he found more stressful than anything technical. From there he lays out a four-step approach to IT resilience: identify risk through team communication, research services and practices, test and monitor everything, and keep a documented disaster recovery plan. He names concrete tooling along the way, including Pacemaker for Linux high-availability clusters and Windows Server Failover Clustering for Windows. He also shares firsthand recoveries from a broken DNS configuration file and a corrupted InnoDB database.