Datasets & Evaluation

Explore our comprehensive collection of datasets and evaluation tools designed to advance Thai language AI research. From benchmarking datasets to leaderboards, discover the resources that power Thai NLP development.

Available Datasets

Access our curated collection of datasets and evaluation tools that support Thai language model development and research.

ThaiOCRBench

ThaiOCRBench is the first comprehensive benchmark for evaluating vision-language models (VLMs) on Thai text-rich visual understanding tasks. The benchmark enables standardized zero-shot evaluation for both proprietary and open-source models, revealing significant performance gaps and paving the way for document understanding in low-resource languages.

Typhoon Isan Speech Corpus

This dataset contains audio recordings of Isan (Northeastern Thai) speech, paired with rich transcriptions and demographic metadata. It is designed to support Automatic Speech Recognition (ASR), dialect study, and text normalization tasks for the Isan language

Typhoon Isan Phonetic Dictionary

This dataset is a phonetic dictionary focused on Isan (Northeastern Thai) pronunciations. It is structured to handle linguistic complexities such as Phonetic Variations and Homographs.

ThaiExam

A Thai knowledge benchmarking dataset comprising multiple-choice questions from five key Thai examinations (ONET, IC, TGAT, TPAT-1, and A-Level), designed to evaluate Thai language models

Typhoon Vision Preview Data

A dataset designed for visual question-answering and image-to-text tasks, supporting both Thai (th) and English (en) languages.

Typhoon Audio Preview Data

A dataset designed for audio question-answering and audio-to-text tasks, supporting both Thai (th) and English (en) languages.

ThaiLLM Leaderboard

The ThaiLLM Leaderboard is specifically designed to evaluate and compare LLMs with Thai language capabilities. The leaderboard tracks the performance of various LLMs across a range of benchmarks and tasks, providing a standard environment where models are assessed under the same conditions. This ensures that results are reproducible and comparable, allowing developers and researchers to gauge how their models' performance relative to others in the community, and ultimately fostering growth in Thai NLP research and development

ThaiExam Leaderboard on Stanford HELM

The ThaiExam Leaderboard is designed to assess language models in real-world Thai scenarios, derived from standardized high school and financial professional exams such as ONET, TGAT, A-Level, and the Investment Consultant (IC) exam. The leaderboard evaluates a range of leading models, including Typhoon powered by SCB 10X and SCBX, offering full transparency at the prompt level. It also provides reproducible results using the HELM's framework. This initiative represents a new publicly available leaderboard specifically designed for Thai language evaluation. It is aimed at driving innovation in Thai language model development and evaluation

IFEval-Thai

A Thai version of the IFEval dataset. The original English instructions were translated into Thai, followed by a manual verification and correction process to ensure accuracy and content consistency. Rows with poor translation quality or irrelevant context in Thai were removed from the dataset.