Summary
Resilient IT starts with resilient people, Jeremy Maldonado says, and the creativity to recover from a setback matters more than knowing everything. He draws the point from a decade in IT and earlier years waiting tables and managing restaurants, work he found more stressful than anything technical. From there he lays out a four-step approach to IT resilience: identify risk through team communication, research services and practices, test and monitor everything, and keep a documented disaster recovery plan. He names concrete tooling along the way, including Pacemaker for Linux high-availability clusters and Windows Server Failover Clustering for Windows. He also shares firsthand recoveries from a broken DNS configuration file and a corrupted InnoDB database.
)
)
)
)
)
)