Blog LogoTYPHOON
Home
Research
Join Text-to-Speech Research
DocumentationDemo AppsPlayground
Blog
About
Blog LogoTyphoon
  • Home
  • Research
    • Join Text-to-Speech Research
  • Get Started
    • Documentation
    • Demo Apps
    • Playground
  • Blog
  • About

© 2025 SCB 10X Co., Ltd.

Typhoon’s Joint Research Included in 5 Accepted Papers at ACL 2025

Typhoon’s Joint Research Included in 5 Accepted Papers at ACL 2025

ConferenceResearchACLNLP
Oravee (Orn) Smithiphol
Oravee (Orn) Smithiphol
June 12, 2025

Table of Contents

1. SkillAggregation: Reference-free LLM-Dependent Aggregation2. Mind the Gap! Static and Interactive Evaluations of Large Audio Models3. Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia4. Towards Better Understanding of Program-of-Thought Reasoning in Cross-Lingual and Multilingual Environments5. Shortcut Learning in Safety: The Impact of Keyword Bias in SafeguardsSummaryStay Tuned: ACL 2025 Insights Coming SoonJoin Our Community

We’re thrilled to share that five papers involving the Typhoon research team—developed in collaboration with VISTEC, Cambridge, Stanford, and SeaCrowd—have been accepted to ACL 2025 , one of the most prestigious conferences in natural language processing and computational linguistics.

ACL (Association for Computational Linguistics) serves as a global stage for groundbreaking research in AI, with rigorous peer review and high visibility among the international research community. It’s an honor to contribute to this year’s conference with three papers in the Main Conference, one in the Findings, and one in a specialized Workshop.

These papers span a diverse range of topics—from language model evaluation and multilingual reasoning to dataset creation and LLM safety. While each project tackles a different challenge, together they reflect our shared goal: advancing AI in a way that is context-aware, inclusive, and practically grounded.

We’re deeply grateful to our collaborators, co-authors, and reviewers who made this possible. Below is a closer look at each paper and the contribution it brings.

1. SkillAggregation: Reference-free LLM-Dependent Aggregation

ACL Paper 1

  • Accepted to Main Conference
  • Paper link: https://arxiv.org/abs/2410.10215
  • Authors from SCB 10X: Guangzhi Sun and Potsawee Manakul

This paper proposes SkillAggregation, a novel reference-free method for aggregating judgments from multiple large language models (LLMs) without requiring ground truth labels.

Unlike traditional approaches that assign equal weight to all LLMs or are task-specific, SkillAggregation dynamically learns the skill of each LLM judge based on contextual inputs, enabling more accurate and adaptive decision-making. It builds upon and improves the Crowdlayer method by incorporating context-dependent skill estimates and a regularization term to mitigate overconfidence in predictions.

Evaluated on tasks like HaluEval-Dialogue, TruthfulQA, and Chatbot Arena, SkillAggregation consistently outperforms existing aggregation baselines, especially when combining outputs from varied-quality LLMs, and demonstrates robustness across different model sizes, datasets, and encoders.

2. Mind the Gap! Static and Interactive Evaluations of Large Audio Models

ACL Paper 2

  • Accepted to Main Conference
  • Paper link: https://arxiv.org/abs/2502.15919
  • Authors from SCB 10X: Kunat Pipatanaku and Potsawee Manakul

This paper presents TalkArena, a new platform for evaluating Large Audio Models (LAMs) through interactive user engagement rather than static benchmarks. By collecting over 7,500 interactions from 484 users using speech-based queries, the authors uncover that users mainly use audio interfaces for tasks that benefit from speed and ease—like seeking knowledge or advice—rather than tasks requiring nuanced speech understanding.

The study finds that a simple pipeline combining Whisper and LLaMA outperforms even advanced commercial models in user preference, primarily due to better text response quality. Notably, the paper reveals that existing static benchmarks poorly predict real-world user preferences, highlighting a significant gap in how LAMs are currently evaluated. This work underscores the need for more user-aligned evaluation methods to guide the development of voice-based AI systems.

3. Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia

ACL Paper 3

  • Accepted to Main Conference
  • Paper link: https://arxiv.org/abs/2503.07920
  • SCB 10X’s Adisai Na-Thalang involved in dataset contribution

The paper introduces SEA-VL, a large-scale, open-source, multicultural vision-language dataset specifically designed to address the underrepresentation of Southeast Asian (SEA) cultures in AI and machine learning research. By combining three methods—crowdsourcing, web crawling, and image generation—the authors collected 1.28 million culturally relevant image-caption pairs from 11 SEA countries, far surpassing existing datasets in both scale and cultural diversity.

The study finds that while crowdsourcing yields the highest quality data, web crawling is more scalable and cost-efficient, and image generation remains inadequate for capturing nuanced cultural contexts. Extensive human evaluation validates the cultural relevance of the collected data, highlighting the limitations of current AI in representing diverse cultures and advocating for more inclusive, culturally grounded dataset creation.

4. Towards Better Understanding of Program-of-Thought Reasoning in Cross-Lingual and Multilingual Environments

ACL Paper 4

  • Accepted to Findings
  • Paper link: https://arxiv.org/abs/2502.17956
  • Author from SCB 10X: Potsawee Manakul (as a co-advisor)

The paper explores how to improve reasoning in multilingual environments using Program-of-Thought (PoT) prompting, a technique that separates reasoning (written as code) from execution (done by an interpreter).

The authors investigate two key challenges: aligning questions in different languages with accurate reasoning steps, and understanding how the quality of those steps affects final answer accuracy. They develop and evaluate fine-tuning strategies across multiple languages and find that PoT outperforms the more commonly used Chain-of-Thought (CoT) prompting, especially in non-English languages.

By using a code quality metric called ICE-Score, they show that better reasoning leads to better results and propose a test-time inference method (Soft Self-Consistency) that further boosts performance. Overall, the study demonstrates that PoT, when carefully fine-tuned and evaluated, significantly enhances multilingual reasoning in large language models.

5. Shortcut Learning in Safety: The Impact of Keyword Bias in Safeguards

ACL Paper 5

  • Accepted to LLM Security Workshop
  • Paper link: https://openreview.net/forum?id=IOP5nuRx5S

This study investigates the vulnerability of Large Language Model (LLM) safeguard systems to shortcut learning, where models rely on superficial keyword cues rather than genuine semantic understanding to classify prompts as safe or harmful.

Such reliance can undermine the robustness of safeguards, especially when facing out-of-distribution (OOD) inputs. The findings underscore the need to address shortcut learning in LLM safeguards to enhance their robustness and reliability. Relying on synthetic data with repetitive patterns can inadvertently teach models to focus on keywords, making them susceptible to misclassification when encountering novel or rephrased inputs.

Summary

ACL 2025 has given us a valuable opportunity to showcase Typhoon’s collaborative contributions to both global and regional research efforts across multiple fronts:

  • 3 papers accepted to the Main Conference explore reference-free aggregation for LLMs, interactive evaluation for audio models, and the creation of a multicultural vision-language dataset for Southeast Asia.

  • 1 paper accepted to Findings advances our understanding of multilingual reasoning through Program-of-Thought prompting.

  • 1 paper accepted to the LLM Security Workshop addresses critical concerns around keyword bias and shortcut learning in LLM safeguards.

We’re especially proud to see research rooted in Southeast Asia—and driven by researchers based in Thailand—contributing meaningfully to the global conversation on NLP and AI.

A heartfelt thank-you to all our collaborators, co-authors, and supporters in the research community. Your encouragement and partnership continue to inspire our work.

Stay Tuned: ACL 2025 Insights Coming Soon

One of our Typhoon team members (myself!) will be attending ACL 2025 in person from July 27 to August 1. I’m looking forward to learning from the global community and sharing key insights and highlights with all of you after the event.

If you’ll be at the conference too, feel free to reach out—we’d love to connect!

Join Our Community

💡 Explore our open-source projects

Open-weight models: huggingface.co/scb10x

More initiatives: opentyphoon.ai

💬 Join the conversation

Connect with us on Discord to discuss ideas, collaborate, or just say hi!

Previous
SCB 10X’s Typhoon Thai-Optimized LLMs Now Available for Enterprise-Ready AI Deployment with NVIDIA NIM

SCB 10X’s Typhoon Thai-Optimized LLMs Now Available for Enterprise-Ready AI Deployment with NVIDIA NIM

© 2025 SCB 10X Co., Ltd.. All rights reserved