ESMOS - Cloud Migration from SaaS to IaaS

Cloud
DevOps
Security
Azure
Kubernetes
PostgreSQL
Prometheus
Grafana
Odoo
Cloud Migration
Solution Architecture
DevSecOps
ESMOS - Cloud Migration from SaaS to IaaS

Project Grade Received: "A" (Overall), "A+" (Presentation)

Overview

Everyday Sustainable Meals Ordering System (ESMOS) is a subscription-based business committed to delivering fresh, gourmet-quality meals crafted from sustainable source ingredients. It is also a play on the module name ESM (Enterprise Solutions Management) and is a fictional company.

My team migrated this meal ordering web application from the Odoo SaaS platform to an Azure IaaS setup, following ITIL practices and implementing DevSecOps to improve scalability, security, and reduce costs.

  • Original Setup: Odoo SaaS
  • New Setup: Azure AKS with Odoo Community Image, Azure PostgreSQL, Azure Functions, Prometheus + Grafana
  • Main Use Case: Scaling up the system for 10,000+ users
  • Project Goals:
    • Improve scalability with Azure Kubernetes Service auto-scaling (HPA)
    • Enhance security with PDPA compliance and protection against cyber attacks
    • Implement data encryption at rest and in transit with Azure PostgreSQL
    • Reduce infrastructure costs
    • Support 10000+ users daily with up to 50 concurrent sessions

Project Phases

1. Change Proposal

We began by drafting a Change Proposal. We also had to take into account a previous incident whereby an attacker broke into a separate IaaS system of ours and modified data and took down our system. The change outline is as follows:

  • Key justifications: scalability, security, and resilience
  • Risk analysis: security vulnerabilities, data breaches
  • Cost comparison: S$0.20/user (SaaS) → S$0.12/user (Azure IaaS)
  • Deployment strategy using Azure CLI, load testing tools & integration tests

2. Mid-Process Change Management Check-In

Presented our updated progress and stakeholder communications, including:

  • Jira ticket lifecycle for tracking
Jira tickets overview
  • Communication matrix (internal & external)
  • Migration schedule & timeline
Change migration timeline

3. Final Presentation

Final results and lessons learned were presented, highlighting:

  • Infrastructure setup (AKS, PostgreSQL, Azure Function for GenAI surveys)
  • YAML-based deployment pipeline
  • DevSecOps measures: TLS, RBAC, NSGs, monitoring
  • Cost reduction of over 40%
  • Load testing success (50 concurrent users, 0% error rate)

Tech Stack

New solution architecture
ComponentTool/Service Used
Container OrchestrationAzure Kubernetes Service (AKS)
DatabaseAzure PostgreSQL Flexible Server
MonitoringPrometheus + Grafana, BetterStack, Azure Monitoring (Basic)
GenAI FeatureAzure Functions + Azure OpenAI
Infrastructure SetupK8s YAML Config Files, Azure CLI
Load TestingLocust

Security & Compliance

  • HTTPS Encryption: Cert Manager + Let’s Encrypt with nip.io
  • Access Control: RBAC, Strict NSG rules
  • Observability: Real-time alerts, metrics via Better Stack + Grafana
  • Compliance Ready: Aligns with Singapore’s PDPA requirements

Behind the Scenes – Technical Narrative

“Moving from SaaS to IaaS wasn’t plug-and-play…”

We took extra steps to ensure that we were prioritising efficiency, as well as a realistic workflow. Some highlights include:

  • Version-controlled experiments: Every manifest change pushed to Git, enabling quick bisects when deployments broke.
  • Kubernetes from scratch: Opted for Azure CLI + YAML over Helm/IaC frameworks for rapid iteration and learning AKS internals, and less focus on the additional tool of Helm, especially as we were uncertain of costs outcomes if done poorly (we had a budget of $100 Azure credits).
  • DNS & WAF headache: The free nip.io wildcard DNS conflicted with all the WAF solutions we tried as it had issues with the NGINX setup. Reversing our progress could have allowed us to experiment and find a working WAF solution but we stuck to the setup we had and could not get it working. Looking back, if we had delegated and called for more manpower, we could have experimented further. We still made sure it was sufficiently secured even without the WAF.
  • Session-affinity surprise: HPA (Horizontal Pod Autoscaling) scaled pods, but user carts dropped on refresh. We then added cookie-based sticky sessions. Additionally, we had to add a readiness delay for each pod when the system scaled, so that It would not trigger users to the Odoo setup page as it was still loading data.

Key Learnings

  • Adopting a business-focused approach to change design, rather than a simple lift-and-shift strategy, ensured we remained aligned with core migration objectives
  • Strategically postponing non-essential security components (WAF) enabled timely delivery while maintaining adequate security measures for the initial deployment
  • Integrating Jira and Slack created comprehensive change traceability, enhancing team accountability and coordination
  • Establishing clear role assignments and implementing version-controlled configurations resulted in a fully auditable and reproducible process

Extra Details on the Technical implementation

New solution architecture
New solution architecture
New solution architecture
New solution architecture
New solution architecture
New solution architecture
New solution architecture