Blog LogoTYPHOON
Home
Research
Join Text-to-Speech Research
DocumentationDemo AppsPlayground
Blog
About
Blog LogoTyphoon
  • Home
  • Research
    • Join Text-to-Speech Research
  • Get Started
    • Documentation
    • Demo Apps
    • Playground
  • Blog
  • About

© 2025 SCB 10X Co., Ltd.

Typhoon’s Papers Acceptances at ICLR 2025: Advancing Open Science for Low-Resource Language AI

Typhoon’s Papers Acceptances at ICLR 2025: Advancing Open Science for Low-Resource Language AI

PaperConferenceTyphoon 2
Oravee (Orn) Smithiphol
Oravee (Orn) Smithiphol
April 04, 2025

Table of Contents

Breaking the Resource Barrier Through Open ScienceTyphoon T1: The First Open Thai Reasoning Model with Structured Thinking FormatAdapting Language-Specific LLMs to a Reasoning Model in One Day via Model Merging – An Open RecipeImpact on Global AI DevelopmentLooking ForwardAbout ICLR and SCI-FM WorkshopMeet Our Team at ICLR 2025Join our virtual community

We're proud to announce the acceptance of two groundbreaking papers at the Open Science for Foundation Models (SCI-FM) workshop, International Conference on Learning Representations (ICLR) 2025 🎉

ICLR is regarded as one of the premier machine learning conferences globally. These acceptances mark a significant milestone in our mission to democratize AI technology for low-resource languages, particularly Thai, and establish a replicable framework for advancing AI capabilities in other low-resource languages.

Breaking the Resource Barrier Through Open Science

AI development has largely focused on resource-rich languages, creating a technological divide. Our research demonstrates how open science can bridge this gap, using Thai as a proving ground for methodologies applicable to any low-resource language.

Typhoon T1: The First Open Thai Reasoning Model with Structured Thinking Format

Typhoon-T1 Structured Long-thinking Data Transformation-And-Refinement Pipeline Figure 1: Structured Long-thinking Data Transformation-And-Refinement Pipeline

Typhoon T1: An Open Thai Reasoning Model represents a fundamental shift in how we approach reasoning capabilities in low-resource languages. Key innovations include:

  • Novel Supervised Fine-tuning Pipeline: Unlike traditional reinforcement learning approaches, our supervised fine-tuning methodology ensures stable and transparent development of reasoning capabilities.
  • Structured Thinking Format: Implementation of XML-based thinking traces that enhance the model's ability to break down complex problems into manageable steps.
  • Open-Source Implementation: Complete transparency in datasets, methodology, and model weights, fostering collaborative development in the Thai AI community.

Fun fact: Typhoon T1 is not only the first Thai reasoning model but also the first reasoning model in Southeast Asia.

Adapting Language-Specific LLMs to a Reasoning Model in One Day via Model Merging – An Open Recipe

Typhoon 2 R1 70B recipe Figure 2: Overview of our Typhoon 2 R1 70B recipe

This paper showcases how we can incorporate advanced reasoning capabilities such as those of DeepSeek R1 into language-specific large language models (LLMs), Typhoon 2 in this case.

We present a scalable approach to enhancing low-resource language models through model merging techniques:

  • Representation Alignment: A sophisticated approach to aligning Thai language understanding with reasoning capabilities through bilingual dataset training.
  • Ability-Aware Layer Weighting: Strategic assignment of model weights that preserves language capabilities while significantly enhancing reasoning abilities.
  • Resource-Efficient Implementation: Achieving state-of-the-art results without the need for massive computational resources.
  • Cross-Lingual Knowledge Transfer: Systematic approach to adapting reasoning capabilities across languages

Impact on Global AI Development

Our research demonstrates that sophisticated AI capabilities aren't exclusive to high-resource languages. These papers represent significant breakthroughs in:

Aspect Achievement
Reasoning Capability Comparable performance to English language models in complex reasoning tasks
Accessibility Full open-source implementation with comprehensive documentation
Resource Efficiency Makes advanced AI development accessible to smaller research communities
Cross-Lingual Methodology Provides a blueprint for adaptation to any other low-resource language

Looking Forward

These acceptances at the SCI-FM Workshop, ICLR 2025, highlight the importance of our approach to open science and our focus on improving low-resource languages, not just the dominant ones. We're committed to:

  1. Open Collaboration: Releasing our models and datasets to the research community
  2. Continued Innovation: Building upon these foundations for even more sophisticated low-resource language AI capabilities, particularly Thai.
  3. Community Engagement: Working with Thai developers and researchers to expand the applications of these technologies

About ICLR and SCI-FM Workshop

We're particularly excited to be presenting at the Open Science for Foundation Models (SCI-FM) Workshop at ICLR 2025, which perfectly aligns with our mission of promoting open science in AI development. Our contributions to SCI-FM workshop exemplify the workshop's core mission: making advanced AI research accessible and reproducible for the global research community.

Meet Our Team at ICLR 2025

Join us at ICLR 2025's SCI-FM workshop to:

  • 🤝 Discuss potential collaborations for adapting our framework to your language
  • 💡 Learn about our open-science methodologies firsthand
  • 🔍 Deep dive into our technical implementations
  • 🌐 Connect with our research team and join our growing community

Workshop Details:

  • 📍 ICLR 2025 SCI-FM (Workshop Hall 4 #5)
  • 📅 April 28, 2025 15:00 - 16:00 (GMT+8)
  • 💬 Meet us at our booth from 24 to 28 April 2025

Join our virtual community

Beyond ICLR 2025: 📚 Read our papers:

  • Typhoon T1: https://arxiv.org/abs/2502.09042
  • Typhoon2 R1: https://arxiv.org/abs/2502.09056

🔬 Explore our open-source implementations

  • Open-weight models: https://huggingface.co/scb10x
  • More open-source initiatives of us can be found at: opentyphoon.ai

📱 Join us in our always-on Discord community

Previous
Tutorial: Running Typhoon Locally with Ollama and Open WebUI

Tutorial: Running Typhoon Locally with Ollama and Open WebUI

Next

7 Things to Know About Typhoon-TTS and FAQs on Contributing Your Voice for Research

7 Things to Know About Typhoon-TTS and FAQs on Contributing Your Voice for Research

© 2025 SCB 10X Co., Ltd.. All rights reserved