We’re excited to share that two of Typhoon’s research papers have been accepted to the main conference of EMNLP 2025, an A*-ranked conference and one of the most prestigious venues in natural language processing.
This marks a major milestone for the team, reflecting our ongoing commitment to advancing open, inclusive, and practical AI research for Thailand and the broader region.
Our accepted papers are:
ThaiInstruct: An Instruction-Following Dataset for Culturally-Aware, Multitask, and Multi-domain Evaluation in Thai

Large language models excel at instruction-following in English, but their performance in low-resource languages like Thai remains underexplored. Existing benchmarks often rely on translations, which miss cultural and domain-specific nuances critical for real-world Thai applications.
Key idea:
ThaiInstruct introduces the first large-scale, human-authored Thai dataset designed for both evaluation and instruction tuning.
Dataset design:
-
**Domains: **Legal, Medical, Finance, Retail
-
**Task types: **Classification, Summarization, Open QA, Closed QA, MCQ, Brainstorming, Creative Writing
**Coverage: **Both general-purpose and culturally-specific instructions
**Quality control: **Built with annotators, domain experts, and AI researchers through a multi-stage process
Experiments & findings:
-
Zero-shot evaluation reveals significant performance gaps in Thai, especially on cultural and professional tasks.
-
Instruction tuning on ThaiInstruct outperforms translated-data baselines in both in-domain and out-of-domain benchmarks.
-
Results confirm that native, culturally grounded supervision is crucial for aligning LLMs in diverse linguistic settings.
Prior Prompt Engineering for Reinforcement Fine-Tuning

This paper introduces Prior Prompt Engineering (pPE) as a new dimension in reinforcement fine-tuning (RFT) of language models. Instead of focusing only on algorithms, reward design, or data selection (as most RFT work does), the authors ask: What if the training prompts themselves could systematically guide models toward specific behaviors?
Key idea:
-
At inference, prompt engineering (iPE) uses instructions (e.g., “think step by step”) to guide behaviors.
-
This paper adapts iPE into training-time prompts (pPE), so that models internalize these behaviors during RFT, not just at inference.
Approach:
- Translate five inference-time prompt engineering strategies into prior prompts for training:
-
Reasoning (Chain-of-Thought)
-
Planning (Plan-and-Solve)
-
Code-based reasoning (Program-of-Thought)
-
Knowledge recall (Generated Knowledge)
-
Null-example utilization (Null-Shot)
- Evaluate on in-domain and out-of-domain benchmarks (AIME2024, HumanEval+, GPQA-Diamond).
Findings:
-
All pPE-trained models outperform their inference-time (iPE) baselines.
-
Null-example pPE yields the largest overall gain, even surpassing reasoning prompts on AIME2024 and GPQA-Diamond.
-
Using a behavior-classification framework, the authors show that **different pPE strategies leave distinct behavioral “signatures” **in the trained models.
Looking Ahead
While these two papers are heading to the EMNLP main conference, we’re also awaiting decisions on our workshop submissions so please stay tuned!
This recognition is a proud moment for our team, and we’re grateful to our collaborators and the NLP community for pushing the boundaries of open AI research together.
💡 If you’re attending EMNLP 2025, we’d love to connect, come find us at our sessions!