Case Studies — Glixy Labs

Quanta Research

YC-backed AI lab · Bangalore · Series A

LLM training 16× A100 6-week run

Cut LLM training cost by 64% — shipped 2 weeks early.

Problem

Fine-tune Llama-3 70B on 2 TB of proprietary research data. AWS quoted ₹14L/month for the compute alone — 6-week runway meant a ~₹21L bill before any results.

Solution

16× A100 cluster on Glixy with NVLink, DeepSpeed ZeRO-3 distributed training, checkpoint replication to S3-compatible object store. Our DevOps team handled the infra; their ML team focused on the model.

Timeline

Day 0 — Contract signed
Day 2 — Cluster live, first training step
Day 4 — Distributed run validated, baseline eval
Week 6 — Final model checkpoint shipped

Result

Metric	Result
Deployment Time	48 Hours
Cost Saving	60%
Uptime	99.9%
Time to ship	−2 weeks vs plan

"Glixy is the only reason we shipped on time. The cluster ran 99.97% over the 6-week training run — we lost 12 minutes total." — David Chen, CTO, Quanta Research

Helix Health

Digital health · Kolkata

On-premise HIPAA

HIPAA-compliant on-prem LLM — deployed in 18 days.

Problem

India's largest digital health co. needed a private Llama-3 70B for clinical decision support. Compliance ruled out every public cloud — patient data could not leave Helix's network.

Solution

Air-gapped 4× A100 cluster on Helix's hardware with E2E encryption, customer-managed keys, HIPAA-ready audit trail, signed runbooks.

Timeline

Day 0–5 — Hardware procurement & rack
Day 6–12 — Air-gap install + LLM deploy
Day 13–18 — Compliance review + handoff

Result

Metric	Result
Deploy + onboarding	18 days
Compliance findings	0
Production throughput	142 QPS

"Compliance signed off on the first review. That's never happened before." — Dr. Anita Kapoor, VP Engineering, Helix Health

Nexora Fintech

Payments & risk · Bangalore

Real-time Mixtral 8x7B

Real-time fraud detection at 95ms p95 — catches ₹3.4Cr/month.

Problem

14M transactions/day. Old fraud model missed 23% of card-not-present fraud. Latency budget: 100ms p95 — anything slower breaks the payment flow.

Solution

Mixtral 8x7B classifier on 4× A100 cluster with vLLM batching, fallback rules, and an eval harness that replays last week of traffic on every release.

Timeline

Week 1 — Eval harness + data labeling
Week 2–3 — Model training + tuning
Week 4 — Production cutover

Result

Metric	Result
Fraud detection rate	+68%
p95 latency	95ms
Fraud caught / month	₹3.4Cr

"Glixy's ML team feels like an extension of ours. We just ship features." — Rohit Mehta, Head of Risk, Nexora

Polaris Studios

Design copilot · Kolkata

Inference Hybrid LLM

−78% inference cost — moved off OpenAI in 3 weeks.

Problem

80,000 DAU design copilot on OpenAI APIs cost ₹18L/month with rising error rates and degrading latency during peak hours.

Solution

Qwen 14B (router) + Llama-3 70B (deep) hybrid on dedicated 4× 4090s. vLLM batching, KV-cache reuse, autoscaling on traffic.

Timeline

Week 1 — Model selection + offline eval
Week 2 — Cluster live, A/B canary
Week 3 — Full cutover, OpenAI dropped

Result

Metric	Result
Inference cost / token	−78%
Steady-state QPS	312
p95 first-token	41ms

"Half the cost, double the throughput, and our team finally owns the stack." — Maya Iyer, Head of Engineering, Polaris

How teams are shipping AI on Glixy.

How we reduced AI infrastructure cost by 60%

Cut LLM training cost by 64% — shipped 2 weeks early.

Production deployments across healthcare, fintech & mobility

HIPAA-compliant on-prem LLM — deployed in 18 days.

Real-time fraud detection at 95ms p95 — catches ₹3.4Cr/month.

−78% inference cost — moved off OpenAI in 3 weeks.

Want to be next?