Publications
December 2024

Typhoon 2: A Family of Open Text and Multimodal Thai Large Language Model

This paper presents Typhoon 2, Thai-optimized models for text, vision, and audio. It outlines methods like continual pre-training and post-training to enhance Thai performance, with evaluation across tasks. The series includes models from 1 to 70 billion parameters, safety tools, and advances in document understanding and speech processing.​

Read

Our Latest Models

Typhoon Release

Typhoon 2

Latest Released Model

Our latest release, building on Typhoon 1.5 and 1.5X. It includes models ranging from compact, edge-capable options (1B and 3B) to 70 billion parameters, specifically optimized for Thai applications. Typhoon 2 delivers superior benchmark performance, supports extended context lengths of up to 128,000 tokens, and facilitates advanced AI tasks with robust safety classifiers and comprehensive function-calling capabilities.

Typhoon T1 (Research Preview)

The first open reasoning model in Southeast Asia is here! Typhoon T1 3B, the debut model in our "Typhoon T" series, is setting a new benchmark for structured, thoughtful AI reasoning—excelling in math, coding, and other complex tasks.

Typhoon2-Audio (Research Preview)

An end-to-end model that processes and generates both text and audio. It performs well on speech-centric tasks like transcription, audio captioning, and speech-to-speech translation, offering robust multi-turn dialogue support and text-to-speech capabilities.

Typhoon2-Vision (Research Preview)

A model optimized for visual data, featuring advanced OCR capabilities for Thai documents, Chart VQA, and more. It enables highly accurate text extraction and context-aware reasoning for tasks like document understanding and visual question answering.

Publications

Typhoon 2: A Family of Open Text and Multimodal Thai Large Language Model

Technical Report

This paper presents Typhoon 2, Thai-optimized models for text, vision, and audio. It outlines methods like continual pre-training and post-training to enhance Thai performance, with evaluation across tasks. The series includes models from 1 to 70 billion parameters, safety tools, and advances in document understanding and speech processing.​

December 2024
Read

Typhoon: Thai Large Language Models

Technical Report

The Typhoon series introduces Thai LLMs optimized for low-resource challenges, using continual training and ThaiExam for evaluation. Fine-tuned for Thai tasks, Typhoon outperforms open-source models and rivals GPT-3.5 in Thai, with greater efficiency.

December 2023
Read

CrossCheckGPT: Universal Hallucination Ranking for Multimodal Foundation Models

Research Paper @ NeurIPS RBFM Workshop 2024

CrossCheckGPT introduces a reference-free method for ranking hallucinations in multimodal foundation models, leveraging cross-system consistency as a measure of robustness. Applicable across domains and tasks, it uses explicit and implicit consistency metrics to assess hallucination levels. The method demonstrates high correlation with human judgments and supports new benchmarks, including the first audio-visual hallucination benchmark, AVHalluBench.

In collaboration with
University of Cambridge, Tsinghua University

May 2024
Read

Enhancing Low-Resource Language and Instruction Following Capabilities of Audio Language Models

Research Paper

This paper evaluates audio language models in low-resource languages, using Thai as an example, revealing their limitations despite multilingual pretraining. It explores data mixtures to optimize models for both a target language and English, integrating audio comprehension and speech instruction-following into a unified framework. The proposed model, Typhoon-Audio, significantly outperforms open-source models and rivals state-of-the-art systems like Gemini-1.5-Pro in both English and Thai.

September 2024
Read

SkillAggregation: Reference-free LLM-Dependent Aggregation

Research Paper

This work introduces SkillAggregation, a novel method for combining judgments from multiple LLMs in NLP tasks without relying on reference labels. Extending the Crowdlayer approach from image classification, SkillAggregation leverages judge estimates during inference. Experiments show that SkillAggregation consistently outperforms existing aggregation methods, achieving state-of-the-art results across most tasks.

In collaboration with
University of Cambridge, Stanford University

October 2024
Read

An Empirical Study of Multilingual Reasoning Distillation for Question Answering

Research Paper @ EMNLP 2024 (main)

This paper explores multilingual reasoning distillation in LLMs, proposing d-CoT-nR, a novel approach that incorporates incorrect rationales alongside positive ones to enhance learning. Experiments on multilingual high-school exams show that d-CoT-nR improves accuracy in unseen languages and step-by-step reasoning, outperforming existing methods focused primarily on English.

In collaboration with
VISTEC

November 2024
Read

Efficient Overshadowed Entity Disambiguation by Mitigating Shortcut Learning

Research Paper @ EMNLP 2024 (main)

This work addresses the challenge of overshadowed entities in entity disambiguation (ED) by proposing a debiasing technique to prevent shortcut learning during training. Unlike knowledge-based methods, this approach avoids added computational overhead at inference. Experiments show state-of-the-art performance on ED datasets, offering a fast and effective solution for improving ED.

In collaboration with
VISTEC

November 2024
Read

McCrolin: Multi-consistency Cross-lingual Training for Retrieval Question Answering

Research Paper @ EMNLP 2024 (Findings)

McCrolin is a multi-consistency cross-lingual training framework designed to enhance consistency, ranking stability, and robustness in cross-lingual QA systems. Using multi-task learning, McCrolin achieves state-of-the-art results on standard QA datasets and excels with varying input sizes. It demonstrates strong generalizability across different encoder architectures and sizes.

In collaboration with
VISTEC

November 2024
Read

Datasets & Evaluation

ThaiExam

Link

A Thai knowledge benchmarking dataset comprising multiple-choice questions from five key Thai examinations (ONET, IC, TGAT, TPAT-1, and A-Level), designed to evaluate Thai language models

Typhoon Vision Preview Data

Link

A dataset designed for visual question-answering and image-to-text tasks, supporting both Thai (th) and English (en) languages.

Typhoon Audio Preview Data

Link

A dataset designed for audio question-answering and audio-to-text tasks, supporting both Thai (th) and English (en) languages.

ThaiLLM Leaderboard

Link

The ThaiLLM Leaderboard is specifically designed to evaluate and compare LLMs with Thai language capabilities. The leaderboard tracks the performance of various LLMs across a range of benchmarks and tasks, providing a standard environment where models are assessed under the same conditions. This ensures that results are reproducible and comparable, allowing developers and researchers to gauge how their models’ performance relative to others in the community, and ultimately fostering growth in Thai NLP research and development

ThaiExam Leaderboard on Stanford HELM

Link

The ThaiExam Leaderboard is designed to assess language models in real-world Thai scenarios, derived from standardized high school and financial professional exams such as ONET, TGAT, A-Level, and the Investment Consultant (IC) exam. The leaderboard evaluates a range of leading models, including Typhoon powered by SCB 10X and SCBX, offering full transparency at the prompt level. It also provides reproducible results using the HELM’s framework. This initiative represents a new publicly available leaderboard specifically designed for Thai language evaluation. It is aimed at driving innovation in Thai language model development and evaluation

IFEval-Thai

Link

A Thai version of the IFEval dataset. The original English instructions (https://huggingface.co/datasets/google/IFEval) were translated into Thai, followed by a manual verification and correction process to ensure accuracy and content consistency. Rows with poor translation quality or irrelevant context in Thai were removed from the dataset.

LogoTyphoon
Open-Source Language Technologies for Thai Language, Knowledge, and Culture
© 2024 SCB 10X Co., Ltd. - All rights reserved.