
In a landscape typically dominated by Silicon Valley and Chinese tech hubs, a Bengaluru-based startup is making headlines by beating established leaders like Google, OpenAI, and Anthropic. Sarvam AI has recently unveiled its latest breakthroughs—Sarvam Vision and Bulbul V3—marking a significant milestone for India’s homegrown “sovereign AI” movement.
Here is a breakdown of why Sarvam AI is being hailed as a game-changer and how it stacks up against the world’s most famous AI models.
Beating the Giants: The Benchmarks
The most striking news is Sarvam’s performance in Optical Character Recognition (OCR) and document intelligence. While models like ChatGPT and Gemini are general-purpose, Sarvam Vision was built with a specific focus on the complexities of Indian documents.
- The Scorecard: Sarvam Vision achieved an accuracy of 84.3% on the olmOCR-Bench, surpassing Gemini 1.5 Pro and the specialized DeepSeek OCR v2.
- Complex Tasks: On the OmniDocBench v1.5 (which tests real-world document understanding), Sarvam scored 93.28%. It excelled specifically in areas where others struggle: complex layouts, technical tables, and dense mathematical formulas.
Key Innovations: Vision and Bulbul V3
Sarvam AI’s strategy isn’t just about text; it’s about a multimodal approach tailored for the Indian context.
- Sarvam Vision: A 3-billion-parameter vision language model. Unlike standard OCR that just “reads” text, Sarvam Vision “interprets” visual elements. It can understand images and text together, making it ideal for digitizing historical manuscripts, financial archives, and newspapers across 22 official Indian languages.
- Bulbul V3: This is the company’s flagship text-to-speech (TTS) model. It supports over 35 voices across 11 languages (with plans to expand to 22). It is designed to be more “expressive” and “stable” for Indian accents and dialects compared to global competitors like ElevenLabs, which often carry high costs for regional language use.
Why This Matters: The Concept of “Sovereign AI”
Founders Pratyush Kumar and Vivek Raghavan are championing the idea of “Sovereign AI”—technology that is built, controlled, and optimized within India.
- Language-First: While global models treat Indian languages as secondary to English, Sarvam treats them as primary.
- Practicality over Hype: The models are designed for real-world utility—helping Indian businesses and government agencies digitize vast amounts of physical data that global models often misinterpret due to script nuances.
Global Skepticism Turns to Praise
The tech community has taken notice. Notable tech commentator Deedy Das, who previously expressed skepticism about the need for small Indic-language models, recently admitted on X (formerly Twitter) that he had underestimated the company. He noted that Sarvam’s focus on speech and OCR fills a massive gap that global AI labs have largely ignored.
Quick Comparison: Sarvam vs. The Competition
| Feature | Sarvam AI (Vision/Bulbul) | ChatGPT / Google Gemini |
| Primary Focus | Indic languages & complex document layouts | General-purpose English & global tasks |
| OCR Accuracy | 84.3% (olmOCR-Bench) | Higher error rates in regional scripts |
| Voice Tech | Specialized for 11+ Indian dialects | Broad, often lacks regional nuance |
| Data Training | Scanned Indian archives, legal/financial docs | Massive web-crawled global data |
| Cost Efficiency | Optimized for Indian scale and pricing | Often expensive for high-volume Indic tasks |
How to Access Sarvam AI
For developers and businesses looking to test these capabilities, Sarvam AI has made its Document Intelligence API free for February 2026. This allows users to experiment with Sarvam Vision at scale and see firsthand how it handles complex Indian-language documents.
Conclusions
Sarvam AI’s success signals a shift in the AI race. By focusing on a “niche” that includes over a billion people and 22 official languages, they haven’t just built a “local version” of ChatGPT—they have built a specialized tool that, in its specific domain, is currently the best in the world.