Typhoon in Action: Empowering Big Data Research at TDRI to Transform Thailand’s Labor Market Insights

Challenges in Labor Market Research

In Thailand’s changing economy, having an accurate picture of Labor market demand is critical for preparing the workforce and shaping policies that can keep pace with technological and economic shifts.

Thailand Development Research Institute (TDRI), with the support from the Program Management Unit for Human Resources & Institutional Development, Research and Innovation (PMU-B), has been running a project to collect and analyse job postings from across multiple websites. The goals are to:

Give the education sector data-driven insights to update curricula and teaching.
Equip policymakers with evidence to design targeted, relevant Labor policies.
Help individuals entering the workforce prepare for evolving skill requirements.

These goals were never going to be easy to achieve. In the past, Labor market research in Thailand relied heavily on surveys which often lacked detailed information like specific skills, was expensive and time-consuming to run, and relied on small samples that overlooked niche or emerging trends.

To bridge these gaps, in 2024 TDRI shifted its focus to real-world job postings, offering a clearer view of employers’ actual needs. Yet this approach brought its own hurdles: massive volumes of unstructured data, scattered across numerous platforms.

Narin Tananitaporn, researcher on TDRI’s Big Data team, joined us in a discussion to explain how his team tackled these challenges and to share the behind-the-scenes process that powers their work, as you’ll discover in this article.

How Typhoon LLM Transformed the Research

As job postings are unstructured data, they are very difficult to extract information and analyse using traditional methods like keyword matching or RegEx. On the other hand, Large Language Models (LLMs) are very good at classification and Named Entity Recognition (NER) and have no issues dealing with out-of-sample observations. Speed was also critical as the team processes roughly 200,000 job posts each quarter, so any solution had to deliver results fast without excessive computing costs.

By integrating Typhoon into their workflow, the team gained three major advantages:

Lower costs compared to many commercial LLMs, with no need for a massive training dataset.
High adaptability, performing well even with new, unseen data.
Contextual awareness, enabling accurate interpretation of ambiguous terms — for example, recognising that “ai” in a graphic design skill listing refers to Adobe Illustrator (.ai), not artificial intelligence.

Behind the Scenes

The Big Data team’s workflow can be summarized as follows:

Scraping job posts from around 20 websites every day.
Parsing each post into structured fields (e.g., job_title, description) and storing them in a NoSQL database.
Cross-referencing company names with the Department of Business Development (DBD) database to assign the correct TSIC code.
Using Typhoon to extract and standardise the most relevant details.

For every job posting processed, Typhoon:

Classifies the occupation group according to the 23 official O*NET categories.
Extracts key skills from the job description.
Identifies and extracts any required experience for the job.
Determines whether the role falls within STEM (Science, Technology, Engineering, Mathematics).

For example, the following job post

has been parsed as the following object:

Field	Value
job_title	Full Stack Engineer - Typhoon Team (Contract - End of 1 Jan 2026)
province	Bangkok
degree	Bachelor
experience_year	Not specified
occupation_group	Computer and Mathematical
tsic_code	64201
skills_required	Full Stack Development, Web Application Development, API Development, Python, JavaScript, React, Next.js, Node.js, SQL, NoSQL, Cloud Platforms, Google Cloud, Cloud-Native Technologies, Open Source Development, AI Research, AI Application Development, Problem Solving, Collaboration, Communication, English, Thai
is_STEM	STEM occupations

Why Typhoon?

When choosing a large language model, cost and accuracy were the two factors that mattered most to the TDRI team.

On the cost side, Narin explained that they compared Typhoon 2 and Meta Llama 3.3, both available at just** $0.88 per 1M tokens** — an affordable rate for continuous, large-scale processing. By contrast, other commercial models such as GPT-4 or Claude offered strong performance but at significantly higher prices, making them impractical for a project processing hundreds of thousands of job posts each quarter.

Cost per 1M tokens (as of July 2025):

Model	Cost per 1M tokens
Typhoon 2 70B Instruct (via together.ai)	$0.88
Meta Llama 3.3 70B Instruct Turbo (via together.ai)	$0.88
GPT-4.1	$2.00 Input / $8.00 Output
GPT-4.1 mini	$0.40 Input / $1.60 Output
Claude Opus 4	$15.00 Input / $75.00 Output
Claude Sonnet 4	$3.00 Input / $15.00 Output
Claude Haiku 3.5	$0.80 Input / $4.00 Output

Editor’s Note: TDRI originally used Typhoon 2 70B for this research project. By the time we published this case study, TDRI had begun transitioning to the newer Typhoon 2.1 Gemma, available on Together.ai at an even lower cost of just $0.20 per 1M tokens.

In terms of accuracy, Typhoon outperformed Llama in STEM classification accuracy, a vital component for education and policy analysis. To compare these two, TDRI built a test set of job postings where they labelled the O*NET occupation group and tested the predictions between Typhoon and Llama. The team found that both LLMs performed roughly the same overall but Typhoon performed significantly better in classifying STEM occupations which was why TDRI chose Typhoon.

Impact and Future Plan

The results of TDRI’s Typhoon-powered Labor market analysis have already made a difference. The data has been shared with policymakers and educational institutions, helping them design policies and curricula that better match the real demands of the job market. The findings have also reached a wider public through appearances on their video, The Standard’s Key Message, and the team’s dedicated online platform jobdata.tdri.or.th. More of their work can be explored at TDRI blog.

Narin shared that TDRI has already adopted Typhoon to several of their projects showcasing its capabilities far beyond this research project alone. For the Embassy of the Republic of Korea in Thailand, the team used Typhoon to cluster and analyse social media posts mentioning South Korea, uncovering key discussion topics and sentiment patterns. These insights informed the embassy’s public diplomacy strategies. A similar approach was taken with social media data mentioning Thai PBS, forming the basis for an evaluation of the broadcaster’s performance for the year of 2023.

Looking ahead, TDRI plans to refine its job classification even further. Instead of categorising roles at the broad occupation group level, the system will identify specific job titles — for example, distinguishing “Data Analyst” from the broader “Computer and Mathematical” category. This finer granularity promises even richer insights for policymakers, educators, and job seekers alike.

Final Thoughts and Advice

For those in research and data analytics exploring LLMs, Narin’s advice is simple but essential: the quality of the output will always depend on the quality of the input. The “garbage in, garbage out” principle applies directly — if your data is messy, incomplete, or inaccurate, your results will suffer.

He emphasises the importance of clean, well-prepared input and warns against blindly trusting model outputs. While LLMs can produce fluent and persuasive text, they are not infallible. They can still generate factual errors or “hallucinations,” especially in specialised domains. “Think of them as powerful assistants, not unquestionable authorities,” he said. Always validate results through human review, trusted data sources, or additional tools to ensure accuracy and reliability.

On Typhoon’s broader role, Narin sees its open-source nature as a major asset to Thailand’s AI ecosystem. It has fostered transparency, encouraged knowledge sharing, and enabled meaningful collaboration across sectors.

Typhoon has strengthened Thailand’s AI community, and I’m excited to see how it will continue to be applied to solve real-world challenges in the years to come,

he concluded.

We hope this story from TDRI sparks ideas and directions for how you, too, can benefit from Typhoon, whether to transform your workflows, unlock new insights, or drive meaningful impact in your own field.

For those interested in exploring the broader impact of local LLMs like Typhoon and their implications for business innovation and national competitiveness, you can read our blog post summary of the panel discussion where Narin also joined as a panelist.

Last but not least, if you have a use case of how you use Typhoon and would like to be featured, we’d love to hear from you! Leave your message in this form or drop us a note on Discord.