Quanta Research
Cut LLM training cost by 64% — shipped 2 weeks early.
Fine-tune Llama-3 70B on 2 TB of proprietary research data. AWS quoted ₹14L/month for the compute alone — 6-week runway meant a ~₹21L bill before any results.
16× A100 cluster on Glixy with NVLink, DeepSpeed ZeRO-3 distributed training, checkpoint replication to S3-compatible object store. Our DevOps team handled the infra; their ML team focused on the model.
- Day 0 — Contract signed
- Day 2 — Cluster live, first training step
- Day 4 — Distributed run validated, baseline eval
- Week 6 — Final model checkpoint shipped
| Metric | Result |
|---|---|
| Deployment Time | 48 Hours |
| Cost Saving | 60% |
| Uptime | 99.9% |
| Time to ship | −2 weeks vs plan |
"Glixy is the only reason we shipped on time. The cluster ran 99.97% over the 6-week training run — we lost 12 minutes total." — David Chen, CTO, Quanta Research