Cheaper, faster, and culturally aware, Avataar’s video AI is built for India’s scale
Avataar AI has launched Varya, a new video model optimized for the Indian market. It offers faster, cheaper, and culturally aware video generation by refining an existing model rather than building from scratch. This breakthrough addresses the high cost of current AI video models, making them accessible for widespread use across various sectors in India. The model is part of a broader effort to expand AI accessibility and development in India.
Avataar AI, one of twelve startups selected for the India AI Mission, has launched Varya, a new video model designed for the Indian market. This model is built to understand local contexts, including identifying various festivals, food, and clothing, addressing a key challenge in AI for culturally diverse regions. The India AI Mission is a government initiative of approximately $1.2 billion aimed at boosting AI development by providing subsidized GPU access to startups.
Varya was developed by refining Alibaba's publicly available Wan 2.2 model through a technique called distillation. This process compresses the model's capabilities into a leaner, faster version. As a result, Varya generates video ten times faster and at a fraction of the cost, completing the process in four steps compared to Wan 2.2's fifty. For instance, using an NVIDIA H200 GPU, Varya can produce a 5-second 720p clip in 45 seconds, significantly faster than the 1,230 seconds required by Wan 2.2.
The model is remarkably cost-effective, with Avataar AI planning to charge just ₹0.48 ($0.005) per second of video on its hosted service. This is a substantial reduction compared to other prominent models like Veo, Kling, Luma, and Runway, which typically charge $0.10 or more per second. This significant price difference is crucial for AI adoption in India, a "video-first market" where high costs have previously limited the reach of AI video technology.
Beyond cost, Varya addresses the issue of cultural insensitivity often found in image and video generation models. Avataar AI has used curated data to train Varya to recognize and incorporate cultural nuances such as food, clothing, architecture, and festivals, ensuring more relevant and less stereotypical outputs.
Varya will be released as an open-weight model on India's AI Kosh portal, along with its training data, allowing developers to self-host or modify it. Avataar AI also plans to make the model available to enterprise customers and is open to collaborations with video tools like Higgsfield and Adobe Firefly.
This launch highlights India's pragmatic approach to AI development, focusing on applications and a robust developer ecosystem rather than competing directly on foundational models. This strategy is vital given the country's challenges with computing power and data availability. The Indian government is actively working to bridge this gap, with plans to attract $200 billion in AI investment by 2028 and significantly increase GPU capacity.
Related articles
Build real agentic apps using CUGA: two dozen working examples on a lightweight harness
CUGA, IBM's open-source Agent Harness, simplifies building agentic applications by handling infrastructure, allowing developers to focus on tools and prompts. It offers pre-assembled components for planning, execution, and state management, significantly reducing development time. CUGA has topped agent benchmarks like AppWorld and WebArena.
OpenAI launches new initiative to help find and patch open source bugs
OpenAI has launched "Patch the Planet," a new initiative in partnership with cybersecurity firm Trail of Bits, to enhance the security of open-source projects. This program aims to assist maintainers in identifying and patching bugs, utilizing OpenAI's AI-powered security tools while reducing the burden on project teams.
PP-OCRv6 on Hugging Face: 50-Language OCR from 1.5M to 34.5M Parameters
Baidu has released PP-OCRv6, an advanced optical character recognition (OCR) model supporting 50 languages. Available on Hugging Face, this version significantly improves accuracy and efficiency across various parameter sizes, from 1.5 million to 34.5 million, marking a substantial leap in multilingual OCR technology.
