Case study

Global Observability Platform | UK | Europe

Azure | Multi-region | Site Reliability Engineering | FedRAMP & SOC2 compliance

Azure Cloud Azure Cloud Adoption Framework Terraform GitOps using GitHub and GitHub Actions GitHub Actions Checkov IaC code analysis Flux ArgoCD Helm charts Kustomize Kafka Zookeeper ClickHouse Cortex IDP ELK / ELF stack eBPF-based observability tooling

Working at one of the Global Observability platforms (based on eBPF), I was part of an SRE team spread across Europe, APAC and USA in a follow the sun pattern to architect, build and deploy and maintain plethora of internal and external facing services for the customer. Some of the main challenges that the customer faced were:

  • Missing Enterprise Landing Zone Architecture design for Azure tenant.
  • Complex terraform estate across corporate and product lacking reusable modules.
  • Azure Kubernetes Services cluster deployment, maintenance and control plane upgrades.
  • Microservices deployments, observability and monitoring.
  • Missing FedRAMP for Azure Gov. Cloud & SOC2 implementation for product market fit.
  • Mentoring offshore and nearshore engineers to be part of the SRE team.
  • Missing guardrails allowing developers to build services outside guardrails.

Outcomes

  • Enterprise-Ready Azure Foundation: Implemented Azure Enterprise Landing Zone design, enabling clear segregation between corporate and product workloads with tailored guardrails for each business unit.

  • Standardised IaC & Reusable Modules: Published a library of Terraform modules—including a production-ready AKS module—adopted by multiple product teams to accelerate cluster provisioning and reduce configuration drift.

  • Zero-Downtime AKS Operations: Managed in-place control plane upgrades and observability deployments across AKS clusters with zero production-impacting incidents, ensuring platform reliability for customer-facing services.

  • Compliance Enablement (SOC2/FedRAMP): Partnered with auditors and security teams to implement policy-as-code guardrails aligned with SOC2 requirements and FedRAMP readiness, accelerating the platform’s path to regulated market segments.

  • Team Maturity & Knowledge Transfer: Mentored offshore and nearshore engineers into the SRE rotation, around the sun on-call model, standardised blameless post-mortems, established RCA practices that improved incident response and platform resilience.