Open to senior / staff roles

Backend engineerfor quiet systemsthat scale.

I'm Chaitanya — a senior software engineer building multi-tenant SaaS platforms, real-time IoT pipelines, and zero-downtime database migrations at production scale.

What I build

Four shapes of the same engineer. The one you're hiring for.

I've built each of these in production, not in a side project. The titles change; the core job is the same: own the backend platform that makes a business work.

01
8 layers of isolation

Multi-Tenant SaaS Backend

Tenant isolation at every layer — API gateway, auth, middleware, Postgres, Cosmos partition, Redis namespacing, event routing, per-tenant feature flags.

02
12 of 15 services owned

Founding / Platform Engineer

Joined as early engineer, owned the architecture end-to-end. Built two standalone systems from scratch, shipped the shared packages the rest of the team builds on.

03
AI as feature, not product

Applied AI Infrastructure

Shipping the platform that AI features run on — ingestion, eventing, permissions, observability. LLM orchestration integrated into a production SaaS, not a demo.

04
Race conditions, solved

Distributed Systems IC

Diagnosed production race conditions, Redis-based coordination, eventual consistency, self-healing on partial failures. Comfortable at the database / network / event-pipeline seams.

Featured work

Three systems, shipped. In depth.

Portfolio of case studies — not demos. Each lived in production, handled real tenants, and was graded by incident-free quarters.

01/Flagship architecture

Unified Facility Hierarchy — one source of truth for device location across 15 services.

Every microservice was resolving device → facility locations its own way. Duplicate logic, inconsistent results, no multi-root zones, and no way to let tenants override global facilities without branching the data model.

3-tierresolution chain, O(1) parent traversal
  • In-memory tree over lookup table

    O(1) parent-chain traversal beat recursive CTEs on every lookup. Redis pub/sub kept caches consistent across service replicas.

  • Overlay semantics

    Tenant-specific facilities shadow global ones without mutation. The base tree stays canonical; tenants layer on top.

  • 12-point validation suite

    Caught 3 data-integrity issues during the 954-line PostgreSQL migration that would have caused silent incorrect lookups in prod.

Outcome

Every downstream service resolves device locations through a single, consistent hierarchy.

Enabled multi-root trees for zones — a requirement the flat model couldn't handle.

Eliminated duplicate lookup logic scattered across 15 services.

Read the full case study
live.diagram
FactoryWarehouseDCZone ABay 3Dock 2Bay 3 (overlay)RESOLVEDtenant.overlay.bay-3
02/Zero-downtime migration

MSSQL → PostgreSQL across 6+ services. Three months in parallel. Zero data loss.

Legacy monolith on MSSQL couldn't keep up with query flexibility needs, scaling cost, or the relational-join patterns the product roadmap required. Big-bang migration was off the table.

12+type mismatches caught by Zod validation
  • Built a shared data-access package

    From scratch — TypeScript + Zod-validated PostgreSQL abstraction with dual ESM/CJS, repository pattern, golang-migrate integration. Became the team's shared foundation.

  • Branch-level isolation

    MSSQL+npm and PostgreSQL+pnpm branches ran in parallel for ~3 months. Service-by-service cutover, not big-bang. Bundled the npm → pnpm monorepo migration into the same breaking-change window.

  • Reference implementation first

    Migrated my own services first; teammates used them as the template. Zod schemas caught 12+ type mismatches that would have been silent bugs in prod.

Outcome

Reduced database hosting costs meaningfully.

The shared package became the foundation for every backend service.

Zero production data loss across the full cutover.

Read the full case study
live.diagram
MSSQLlegacyPostgreSQLtargetshared data-storeZod validation12+ TYPE MISMATCHEScaught before silent-bugSERVICE-BY-SERVICE CUTOVER — 6 SERVICES, 3 MONTHS, ZERO DATA LOSS
03/Distributed systems

Race conditions, diagnosed. Redis distributed locks. Clear data ownership.

Two services were writing to the same health records from different paths. Intermittent duplicate writes and inconsistent records in production. The obvious fix — ETags + retry loops — treated the symptom, not the cause.

275 linesduplicate write logic removed
  • Ownership over concurrency control

    Root cause was shared ownership, not missing locks. Made one service the sole owner of its data; others went through it. Clearer than retry storms.

  • Redis lock over message queue

    `SET key NX EX 5` gave sub-millisecond coordination. Operators watch live dashboards — queue latency would have been felt. Graceful degradation: if Redis drops, writes proceed without facility enrichment and self-heal on next heartbeat.

  • Decoupled middleware

    Prevented a regression 2 months later when a shared utility package shipped a breaking change. Services could be deployed and scaled independently.

Outcome

Eliminated duplicate writes and inconsistent records in production.

Services deploy and scale independently without cross-service coordination.

Middleware decoupling caught a regression before it hit prod.

Read the full case study
live.diagram
Service A — IngressService B — EnricherHealth record (row)REDIS LOCK APPLIEDBEFORE — DUPLICATE WRITESAFTER — SERIALIZED
The differentiator

Multi-tenancy at eight layers, not one.

Most teams bolt tenant isolation onto the query layer and call it done. I shipped it through every layer a request touches — from the edge to the event bus to the cache key.

01
Layer 01 / 08

API Gateway

The request arrives with tenant context already bound. No downstream service is trusted to infer it.

Enforced as
x-customer-id: acme
x-authorized-groups: [region:west]

customer_id + authorized_groups headers stamped on every request

02
Layer 02 / 08

Auth

Auth0 issues the JWT, but the service verifies the org claim matches the header before a single query runs.

Enforced as
jwt.user_metadata.organization
  === req.customer_id  // or 401

JWT claim validation — user_metadata.organization must match

03
Layer 03 / 08

Middleware

One middleware. Every route pulls tenant from res.locals. Zero places in application code reading headers directly.

Enforced as
res.locals.tenant = extractTenant(req)
  // single source of truth

Centralized tenant extraction into res.locals — never re-parsed

04
Layer 04 / 08

PostgreSQL

Repositories accept tenant as a first-class parameter. No raw SQL in application code. data-store enforces the clause.

Enforced as
repo.devices.findBy({
  tenant, filter
}) // customer_id baked in

WHERE customer_id = ? enforced at repository layer

05
Layer 05 / 08

Cosmos DB

Tenants land on different physical partitions. Hot tenants don't starve cold ones, and cross-tenant queries are literally impossible by construction.

Enforced as
container.items.create(doc, {
  partitionKey: tenant
})

Partition key = customer_id for physical isolation

06
Layer 06 / 08

Redis

Cache keys, rate limits, locks — all tenant-prefixed. `SCAN` never crosses tenant boundaries. Cache invalidation is scoped by construction.

Enforced as
`gateway:${tenant}:device:${id}`

Namespaced keys — gateway:${customerId}:*

07
Layer 07 / 08

Event pipeline

Event envelopes carry tenant. Subscribers filter by it. No fan-out accidentally delivers cross-tenant data to a handler.

Enforced as
event: { tenant, type, payload }
subscriber.filter(e.tenant === self)

Tenant context propagated through the entire async pipeline

08
Layer 08 / 08

Feature flags

useUnifiedFacilityTree was enabled customer-by-customer. Risky features ship without risk to the broader tenant base.

Enforced as
flags.isEnabled('useUFT', tenant)

Per-tenant flags for controlled rollouts

Experience

Six years. One startup. Full platform ownership.

I've optimized for depth, not logos. Joining early, staying long, and shipping every layer of a multi-tenant IoT SaaS is the story.

  1. Senior Software Engineer

    · Trackonomy Systems IncJun 2023 — Present

    San Jose, CA

    • Architected and led the migration from a legacy monolith to an event-driven microservices platform, improving end-to-end latency by 40% and supporting 5M+ events/day across multi-tenant production environments.
    • Owned end-to-end architecture and six production microservices as a founding engineer — effectively half the platform's backend, with 99.9%+ availability.
    • Designed real-time data processing pipelines using Apache Flink and MQTT, lifting data accuracy from ~70% to >99% and cutting downstream discrepancies by >90%.
    • Led database modernization from MSSQL to PostgreSQL, redesigning schemas and indexing for a 35% latency reduction under high-concurrency workloads.
    • Led SOC 2 certification end-to-end — access controls, data security policies, audit-ready processes. Cloud cost optimization delivered 25% sustained monthly savings.
    TypeScriptPostgreSQLAKSApache FlinkAzureSOC 2
  2. Software Design Engineer III

    · Trackonomy Systems IncApr 2021 — Jun 2023

    San Jose, CA

    • Designed and implemented Python and TypeScript backend services and RESTful APIs for real-time device communication, supporting ingestion and processing of 5M+ events/day.
    • Built asynchronous, fault-tolerant ingestion pipelines using event grids, service buses, and Redis streams — achieving 99.99%+ data reliability and near-zero event loss.
    • Optimized relational and NoSQL schemas, indexes, and partition strategies — cutting query latency 17% and lowering Cosmos DB RU costs at startup scale.
    • Built React-based frontend interfaces; partnered with product, QA, frontend, and mobile to translate ambiguous requirements into specs.
    PythonTypeScriptReactRedisCosmos DB
Education
  • University of North Carolina at Charlotte

    MS in Computer Science

    Aug 2018 — May 2020 · GPA 3.8 / 4.0

  • Pune University

    BS in Computer Science

    Aug 2013 — May 2017 · GPA 3.5 / 4.0

Publications
  • IEEE

    A Scalable and Adaptive Hybrid Geolocation Framework for Heterogeneous IoT Environments

  • IERJ

    Music Player Design using Google Materials

Technical range

Grouped the way I use them — not the way résumés list them.

Everything here I've shipped in production, not just read about. Heaviest weight on the top two rows; the others are support surfaces I pick up as the work demands.

Backend
  • TypeScript / Node.js
  • Python
  • Go
  • REST / GraphQL
  • Zod
  • Express
  • gRPC
Databases shipped to prod
  • PostgreSQL
  • MSSQL
  • Cosmos DB
  • MySQL
  • Redis
  • Firebase
Events & Streaming
  • Azure Event Grid
  • Event Hubs
  • Service Bus
  • Kafka
  • Apache Flink
  • BullMQ
  • RabbitMQ
  • MQTT
Infra & deploy
  • Azure (AKS, APIM, Functions)
  • Kubernetes
  • Docker
  • Terraform
  • Helm
  • GitHub Actions
  • Azure DevOps
Observability
  • Datadog
  • Grafana
  • ELK
  • OpenTelemetry
  • SLOs / alerting
AI platform work
  • LLM orchestration
  • OpenAI API
  • Prompt engineering
  • Agent patterns
  • Model serving (REST / gRPC)
Security & compliance
  • SOC 2 (end-to-end)
  • IAM / RBAC
  • OAuth 2.0 / JWT
  • Secure ingress
About

The person behind the platform.

Most of the site is proof points. This is who's behind them.

portrait.jpg
Chaitanya Deshpande at a holiday light installation
Chaitanya DeshpandeLivermore · CA

I like the parts of systems where correctness depends on knowing what actually happens to a packet — race conditions, tenant boundaries, eventual consistency. The quiet, unglamorous stuff that decides whether a platform holds up under real traffic.

Joined an early-stage IoT platform as one of its first backend engineers in 2021 and stayed long — owning the work from the first migration to multi-tenant scale teaches you more than chasing logos does. Originally from Pune, now based in the Bay Area. Most at home near a whiteboard or a terminal watching a long migration finish.

Based
Livermore, CA
Timezone
Pacific (UTC−8)
Speaks
English · Marathi · Hindi
Currently
Senior backend engineer
Open to
Senior / Staff
Reply time
Within 24 hrs
Get in touch

If your JD says multi-tenant, platform, or founding backend— let's talk.

Fastest path to me is email. I reply to every serious outreach within 24 hours — based in Livermore, CA, open to Bay Area hybrid and US remote.