Explore our comprehensive collection of datasets and evaluation tools designed to advance Thai language AI research. From benchmarking datasets to leaderboards, discover the resources that power Thai NLP development.
Access our curated collection of datasets and evaluation tools that support Thai language model development and research.
A Thai knowledge benchmarking dataset comprising multiple-choice questions from five key Thai examinations (ONET, IC, TGAT, TPAT-1, and A-Level), designed to evaluate Thai language models
A dataset designed for visual question-answering and image-to-text tasks, supporting both Thai (th) and English (en) languages.
A dataset designed for audio question-answering and audio-to-text tasks, supporting both Thai (th) and English (en) languages.
The ThaiLLM Leaderboard is specifically designed to evaluate and compare LLMs with Thai language capabilities. The leaderboard tracks the performance of various LLMs across a range of benchmarks and tasks, providing a standard environment where models are assessed under the same conditions. This ensures that results are reproducible and comparable, allowing developers and researchers to gauge how their models' performance relative to others in the community, and ultimately fostering growth in Thai NLP research and development
The ThaiExam Leaderboard is designed to assess language models in real-world Thai scenarios, derived from standardized high school and financial professional exams such as ONET, TGAT, A-Level, and the Investment Consultant (IC) exam. The leaderboard evaluates a range of leading models, including TYPHOON powered by SCB 10X and SCBX, offering full transparency at the prompt level. It also provides reproducible results using the HELM's framework. This initiative represents a new publicly available leaderboard specifically designed for Thai language evaluation. It is aimed at driving innovation in Thai language model development and evaluation
A Thai version of the IFEval dataset. The original English instructions were translated into Thai, followed by a manual verification and correction process to ensure accuracy and content consistency. Rows with poor translation quality or irrelevant context in Thai were removed from the dataset.