Skip to content

About the Guru

Hi, I'm Vaishak — Director of Infrastructure Automation & SRE at one of Canada's largest insurers. I've spent nearly two decades designing, breaking, and rebuilding enterprise infrastructure — from storage systems and CRM platforms to cloud-native SRE practices and full-stack observability.

OpenTelemetry is where my work and obsession converge. I've implemented OTel-based observability at enterprise scale, built SRE frameworks around SLIs and SLOs, and led teams that reduced incident resolution time by 60% through better telemetry. This blog is the knowledge I wish I'd had when I started that journey.

I'm also a CNCF contributor, community builder, and sponsor of an enterprise SRE & Automation Community of Practice. The goal is to close the gap between conference talks and production reality.

Why "Guru"?

In the original Sanskrit sense, a guru is not a master on a pedestal, but a remover of darkness (gu = darkness, ru = remover). Someone who shines a light so others don't stumble.

That is the mission here. OpenTelemetry is complex. The "darkness" of distributed systems — where requests vanish into microservice black holes — is real. My goal is to share what I've learned through years of trial, error, and scaling.

From Lab to Production

Most tutorials stop at docker-compose up. This blog is about what comes after:

If you're an SRE, DevOps engineer, or just someone tired of debugging in the dark — welcome. The content here is built on real production experience, not toy demos.

Resume — Vaishak Nair

Experience

Canada Life Assurance Company — London, ON
Director, Infrastructure Automation & SRE Jan 2026 – Present
  • Lead strategic direction and roadmap for automation platforms, cloud-native observability, and telemetry modernization across the enterprise.
  • Drive AIOps adoption including noise reduction, anomaly detection, and automated remediation; oversee SRE framework, SLIs/SLOs, and incident management practices.
  • Build and mentor teams across SRE, Automation Engineering, DevOps, and Operations Support; sponsor the enterprise SRE & Automation Community of Practice.
Site Reliability Engineer / Sr Technical Lead Jul 2021 – Dec 2025
  • Implemented observability solutions using Splunk Observability, AppDynamics & Dynatrace — achieving 60% improvement in incident resolution via reduced MTTD/MTTR.
  • Grew the SRE practice from 4 to 10 engineers; developed an observability maturity model adopted org-wide.
  • Defined SLI/SLO/SLA frameworks and engineered toil reduction through CI/CD improvements and automated runbook execution.
Sr Technical Specialist – Enterprise Storage, Engineering & Operations Jul 2013 – Jun 2021
  • Automated storage provisioning using Ansible with Dell EMC Powermax, Isilon & Brocade modules — eliminating manual toil and reducing provisioning lead time.
  • Designed snapshot-based recovery (EMC SnapVx) reducing disk costs by 50%+ and RTO to under 30 minutes.
IBM India Pvt Ltd — Bangalore, India
Technical Lead, Application Management Services Jan 2011 – Apr 2013

Led Level-3 support for property & asset management applications (Jones Lang LaSalle EMEA). Planned and executed Oracle 9i→10g upgrade with zero business downtime; awarded IBM Eminence & Excellence Award 2012.

Team Lead – Siebel CRM Support Jul 2007 – Dec 2010

Certifications

Splunk Core Certified Power User AWS Certified Solutions Architect – Associate AWS Certified SysOps Administrator – Associate ITIL V3 Foundation

Education

B.Tech, Information Technology — Anna University, Chennai
2003 – 2007
PG Diploma, Business Administration & IT — Symbiosis, Pune
2009 – 2011

Technologies

OpenTelemetry Splunk Observability Splunk Dynatrace AppDynamics SignalFx AWS Kubernetes Red Hat Ansible Terraform Dell EMC Powermax Dell EMC Isilon Brocade