CASE STUDIES87 active customers

How teams are shipping AI on Glixy.

Real workloads. Real numbers. Real customers.

More stories

Production deployments across healthcare, fintech & mobility

Helix Health

Digital health · Kolkata

On-premise HIPAA

HIPAA-compliant on-prem LLM — deployed in 18 days.

Problem

India's largest digital health co. needed a private Llama-3 70B for clinical decision support. Compliance ruled out every public cloud — patient data could not leave Helix's network.

Solution

Air-gapped 4× A100 cluster on Helix's hardware with E2E encryption, customer-managed keys, HIPAA-ready audit trail, signed runbooks.

Timeline
  1. Day 0–5 — Hardware procurement & rack
  2. Day 6–12 — Air-gap install + LLM deploy
  3. Day 13–18 — Compliance review + handoff
Result
MetricResult
Deploy + onboarding18 days
Compliance findings0
Production throughput142 QPS
"Compliance signed off on the first review. That's never happened before." — Dr. Anita Kapoor, VP Engineering, Helix Health

Nexora Fintech

Payments & risk · Bangalore

Real-time Mixtral 8x7B

Real-time fraud detection at 95ms p95 — catches ₹3.4Cr/month.

Problem

14M transactions/day. Old fraud model missed 23% of card-not-present fraud. Latency budget: 100ms p95 — anything slower breaks the payment flow.

Solution

Mixtral 8x7B classifier on 4× A100 cluster with vLLM batching, fallback rules, and an eval harness that replays last week of traffic on every release.

Timeline
  1. Week 1 — Eval harness + data labeling
  2. Week 2–3 — Model training + tuning
  3. Week 4 — Production cutover
Result
MetricResult
Fraud detection rate+68%
p95 latency95ms
Fraud caught / month₹3.4Cr
"Glixy's ML team feels like an extension of ours. We just ship features." — Rohit Mehta, Head of Risk, Nexora

Polaris Studios

Design copilot · Kolkata

Inference Hybrid LLM

−78% inference cost — moved off OpenAI in 3 weeks.

Problem

80,000 DAU design copilot on OpenAI APIs cost ₹18L/month with rising error rates and degrading latency during peak hours.

Solution

Qwen 14B (router) + Llama-3 70B (deep) hybrid on dedicated 4× 4090s. vLLM batching, KV-cache reuse, autoscaling on traffic.

Timeline
  1. Week 1 — Model selection + offline eval
  2. Week 2 — Cluster live, A/B canary
  3. Week 3 — Full cutover, OpenAI dropped
Result
MetricResult
Inference cost / token−78%
Steady-state QPS312
p95 first-token41ms
"Half the cost, double the throughput, and our team finally owns the stack." — Maya Iyer, Head of Engineering, Polaris

Want to be next?

We'll publish a case study with you (with permission) — most customers love the recruiting boost.