Senior DevOps, Platform & Site Reliability Engineer. Five years building cloud-native systems on AWS, Azure, and GCP — multi-region Kubernetes, GitOps delivery, and observability that actually answers questions.
Real numbers from real production work — not vanity metrics.
Senior Site Reliability Engineer on a multi-region production platform for an AI SaaS. Three AWS regions, EKS with Karpenter, distributed tracing through New Relic, SOC 2 & HIPAA enablement.
Scaled the platform from two regions to three — VPC and CIDR planning, EKS provisioning, RDS replication, and DNS delegation for global SaaS workloads.
1.29 → 1.34 in sequence, zero customer-visible downtime, using Karpenter-driven rolling node replacements, PDB tuning per service, and addon validation in CI.
20–30% reduction across regions via Compute Savings Plans, VPC endpoint routing, log-retention hygiene, and FinOps-tagged right-sizing that won't quietly leak back.
Three AWS regions like this one. EKS in private subnets, IPsec site-to-site tunnels back to enterprise customers' datacenters, multi-AZ HA on everything stateful.
GitOps from commit to production. ArgoCD reconciles, Karpenter scales, PDBs hold the line. This is what a healthy delivery surface looks like in real time.
Each tied to actual production outcomes, not slideware.
Karpenter-driven rolling node replacements. PDBs tuned per service. Addon validation gated in CI. The boring kind of upgrade — which is the point.
Compute Savings Plans, VPC endpoint routing, log-retention sanity, right-sizing. Wrote the FinOps tagging policy so the savings wouldn't quietly leak back.
Pulumi + ArgoCD system where promoting a service to prod is a YAML change. Atomic traffic switching, instant rollback, no human holding their breath.
In reverse chronological order. Click any role for the detail.
Tap a category to highlight what gets used where. Hover any tile for the name.
Independent verification of the cloud, container, and IaC stack I work with daily.
Open to senior platform / SRE / DevOps roles — remote, hybrid, or relocation for the right team. Have a platform problem worth solving? I'd like to hear about it.