OpenAI Launches IndQA Benchmark Based on Indian Languages and Culture
OpenAI has introduced “”IndQA””, a pioneering benchmark designed to test how effectively artificial intelligence (AI) systems comprehend India’s diverse languages, cultural nuances, and contexts. The initiative marks a major step in enhancing AI’s multilingual and multicultural understanding, beginning with one of the world’s most linguistically rich regions.
Purpose and Development
Developed in collaboration with 261 domain experts across India, IndQA comprises 2,278 questions spanning 12 Indian languages and 10 cultural domains, including literature, food, history, spirituality, and daily life. Unlike conventional benchmarks such as MMMLU and MGSM, IndQA’s content is “natively written” — not translated — ensuring authenticity in phrasing, intent, and cultural context. OpenAI stated that this project aligns with its goal of creating AI systems that understand people the way they naturally speak and think, not merely through literal translation.
Structure and Evaluation Method
IndQA adopts a “”rubric-based evaluation system””. Each question includes a culturally contextual prompt in an Indian language, an English translation for verification, a grading rubric, and an ideal expert-level answer. Responses are assessed against specific criteria designed by domain experts, each with weighted scores. The final grade reflects how accurately a model meets the expert expectations on nuance, reasoning, and cultural correctness.
Languages and Cultural Scope
The benchmark covers 12 languages — Bengali, English, Hindi, Hinglish, Kannada, Marathi, Odia, Telugu, Gujarati, Malayalam, Punjabi, and Tamil. The dataset spans multiple cultural and intellectual areas such as Architecture & Design, Arts & Culture, Everyday Life, Law & Ethics, Media & Entertainment, Religion & Spirituality, and Sports & Recreation. According to OpenAI, India was chosen as the starting point due to its vast linguistic diversity and because nearly a billion Indians do not use English as their primary language.
Exam Oriented Facts
- IndQA includes 2,278 questions in 12 Indian languages across 10 cultural domains.
- Developed with contributions from 261 domain experts across India.
- Uses a rubric-based grading method instead of multiple-choice testing.
- Benchmarked using GPT-4o, OpenAI o3, GPT-4.5, and GPT-5 models.
Significance and Future Plans
Srinivas Narayanan, CTO of B2B Applications at OpenAI, said the aim was to ensure models grasp “the nuances every culture cares about.” The company plans to replicate this framework in other regions to improve AI inclusivity and performance beyond English-speaking contexts. With India as ChatGPT’s second-largest market, IndQA reinforces OpenAI’s commitment to making its technology more accessible, reliable, and culturally attuned for non-English users worldwide.