This paper presents Typhoon 2, Thai-optimized models for text, vision, and audio. It outlines methods like continual pre-training and post-training to enhance Thai performance, with evaluation across tasks. The series includes models from 1 to 70 billion parameters, safety tools, and advances in document understanding and speech processing.
ReadOur latest release, building on Typhoon 1.5 and 1.5X. It includes models ranging from compact, edge-capable options (1B and 3B) to 70 billion parameters, specifically optimized for Thai applications. Typhoon 2 delivers superior benchmark performance, supports extended context lengths of up to 128,000 tokens, and facilitates advanced AI tasks with robust safety classifiers and comprehensive function-calling capabilities.
Technical Report
This paper presents Typhoon 2, Thai-optimized models for text, vision, and audio. It outlines methods like continual pre-training and post-training to enhance Thai performance, with evaluation across tasks. The series includes models from 1 to 70 billion parameters, safety tools, and advances in document understanding and speech processing.
Technical Report
The Typhoon series introduces Thai LLMs optimized for low-resource challenges, using continual training and ThaiExam for evaluation. Fine-tuned for Thai tasks, Typhoon outperforms open-source models and rivals GPT-3.5 in Thai, with greater efficiency.
Research Paper @ NeurIPS RBFM Workshop 2024
CrossCheckGPT introduces a reference-free method for ranking hallucinations in multimodal foundation models, leveraging cross-system consistency as a measure of robustness. Applicable across domains and tasks, it uses explicit and implicit consistency metrics to assess hallucination levels. The method demonstrates high correlation with human judgments and supports new benchmarks, including the first audio-visual hallucination benchmark, AVHalluBench.
In collaboration with
University of Cambridge, Tsinghua University
Research Paper
This paper evaluates audio language models in low-resource languages, using Thai as an example, revealing their limitations despite multilingual pretraining. It explores data mixtures to optimize models for both a target language and English, integrating audio comprehension and speech instruction-following into a unified framework. The proposed model, Typhoon-Audio, significantly outperforms open-source models and rivals state-of-the-art systems like Gemini-1.5-Pro in both English and Thai.
Research Paper
This work introduces SkillAggregation, a novel method for combining judgments from multiple LLMs in NLP tasks without relying on reference labels. Extending the Crowdlayer approach from image classification, SkillAggregation leverages judge estimates during inference. Experiments show that SkillAggregation consistently outperforms existing aggregation methods, achieving state-of-the-art results across most tasks.
In collaboration with
University of Cambridge, Stanford University
Research Paper @ EMNLP 2024 (main)
This paper explores multilingual reasoning distillation in LLMs, proposing d-CoT-nR, a novel approach that incorporates incorrect rationales alongside positive ones to enhance learning. Experiments on multilingual high-school exams show that d-CoT-nR improves accuracy in unseen languages and step-by-step reasoning, outperforming existing methods focused primarily on English.
In collaboration with
VISTEC
Research Paper @ EMNLP 2024 (main)
This work addresses the challenge of overshadowed entities in entity disambiguation (ED) by proposing a debiasing technique to prevent shortcut learning during training. Unlike knowledge-based methods, this approach avoids added computational overhead at inference. Experiments show state-of-the-art performance on ED datasets, offering a fast and effective solution for improving ED.
In collaboration with
VISTEC
Research Paper @ EMNLP 2024 (Findings)
McCrolin is a multi-consistency cross-lingual training framework designed to enhance consistency, ranking stability, and robustness in cross-lingual QA systems. Using multi-task learning, McCrolin achieves state-of-the-art results on standard QA datasets and excels with varying input sizes. It demonstrates strong generalizability across different encoder architectures and sizes.
In collaboration with
VISTEC
A Thai knowledge benchmarking dataset comprising multiple-choice questions from five key Thai examinations (ONET, IC, TGAT, TPAT-1, and A-Level), designed to evaluate Thai language models
A dataset designed for visual question-answering and image-to-text tasks, supporting both Thai (th) and English (en) languages.
A dataset designed for audio question-answering and audio-to-text tasks, supporting both Thai (th) and English (en) languages.
The ThaiLLM Leaderboard is specifically designed to evaluate and compare LLMs with Thai language capabilities. The leaderboard tracks the performance of various LLMs across a range of benchmarks and tasks, providing a standard environment where models are assessed under the same conditions. This ensures that results are reproducible and comparable, allowing developers and researchers to gauge how their models’ performance relative to others in the community, and ultimately fostering growth in Thai NLP research and development
The ThaiExam Leaderboard is designed to assess language models in real-world Thai scenarios, derived from standardized high school and financial professional exams such as ONET, TGAT, A-Level, and the Investment Consultant (IC) exam. The leaderboard evaluates a range of leading models, including Typhoon powered by SCB 10X and SCBX, offering full transparency at the prompt level. It also provides reproducible results using the HELM’s framework. This initiative represents a new publicly available leaderboard specifically designed for Thai language evaluation. It is aimed at driving innovation in Thai language model development and evaluation