
Case Study: How VISAI Leverages Typhoon to Make Thai Legal AI Assistants More Accessible
Use CaseCommunity StoriesLegalChatbot

Table of Contents
From consultancy to venture building, VISAI has always believed in the transformative power of AI. Now, the company is channeling that belief into a socially impactful mission: making legal knowledge more accessible to everyone in Thailand.
“In Thailand, accessing legal expertise—especially in finance or taxation—can be daunting for individuals and small organizations,” explains Pawitsapak Akarajaradwong, Senior Data Scientist at VISAI. “We saw an opportunity to bridge that gap using AI, and specifically, Thai language models.”
Today, VISAI is developing in-house legal AI tools that are not only highly capable, but finely tuned to the complexities of Thai language and law. Alongside product development, VISAI continues to invest in its NLP research team to strengthen Thailand’s position in the AI ecosystem.
Meet Sommai and Somsi: Thai Legal Assistants
To bring its vision to life, VISAI has launched two legal AI assistants: Sommai and Somsi. Each is built to solve a different challenge in the Thai legal landscape—and both are powered by Typhoon, the leading Thai large language model (LLM).
Sommai: An Open-Source Legal Assistant for Thai Financial Law
Sommai is VISAI’s open-source legal chatbot project designed to help users navigate the often complex world of Thai financial law. Built on a Retrieval-Augmented Generation (RAG) framework, Sommai combines advanced information retrieval with generative AI to provide accurate, contextual responses. At its core, it leverages Typhoon as the default LLM for generating natural-language answers—fine-tuned specifically for the Thai legal domain.
But Sommai is more than just a chatbot—it’s a full-stack, open platform. VISAI has released everything to the public, from the frontend and backend components to the legal datasets, making it a powerful base for developers, researchers, and institutions aiming to build their own Thai legal AI tools.
Data Collection and Curation
Sommai is powered by a legal knowledge base that includes 35 financial laws, sourced directly from the Office of the Council of State (สำนักงานคณะกรรมการกฤษฎีกา). These laws are structured into a machine-readable JSON format with fields such as law_name
, section_num
, section_content
, and reference
.
VISAI has documented the entire data preparation process in detail in this technical blog post (in Thai), and the dataset is publicly available via Hugging Face.
Retrieval System & Architecture
Sommai uses a two-stage retrieval system based on WangchanX-Legal-ThaiCCL-Retriever, fine-tuned to prioritize relevant Thai legal documents:
-
BGE M3-Embedding: for semantic similarity and fast vector search
-
BGE-Reranker-V2-M3: to rank retrieved passages by legal relevance
Figure 1. Diagram of Sommai’s retrieval model, by Thitiwat via Medium.
Behind the scenes, the system runs on a Kubernetes-based modular architecture, with separate components for:
-
Frontend interface
-
Embedding and retrieval services
-
RAG controller
-
LLM engine
-
Web gateway
Deployment uses vLLM to serve Typhoon efficiently, and LlamaIndex for ingesting legal content. Since the corpus size is manageable, an in-memory vector DB ensures speed without extra complexity. For an in-depth explanation, see the technical blog post and the GitHub repository.
Why Typhoon?
We tested several LLMs, including open models like LLaMA 3 and Qwen,” Pawitsapak shares, “but Typhoon consistently gave us the best generation quality—especially when responding to legal queries in Thai. Typhoon understands how Thai legal texts are structured. That makes a big difference.
One reason is Typhoon’s continued pretraining (CPT) on high-quality Thai-language corpora, which gives it a deep understanding of both formal and technical Thai even in legal contexts.
Developers and curious users can try Sommai at sommai.wangchan.ai. It’s free and open to the public.
Figure 2. Playground view with
llama3-typhoon-instruct-70b
set as the default generation model
Figure 3. Sommai answers: “Can a company avoid withholding tax?”
Somsi: A Specialized Tax Law Chatbot
Somsi is VISAI’s latest product—a chatbot built specifically to answer questions about Thai tax law. At its core, Somsi also uses RAG framework and is designed from the ground up for real-world deployment. It can be run locally or on-premise, making it ideal for environments where speed, reliability, and data privacy are critical.
Building a legal chatbot that performs well in resource-constrained settings isn’t easy. “One of the biggest challenges,” says Pawitsapak, Senior Data Scientist at VISAI, “was making sure the answers were both accurate and fast—even when compute resources were limited.”
That’s where Typhoon stood out. Among all the Thai LLMs VISAI evaluated, Typhoon received the most positive feedback from users and consistently outperformed other models in both accuracy and contextual understanding. Its strong command of Thai legal language and domain-specific nuance made it the clear choice.
Even in limited infrastructure settings, Somsi delivers high-quality, reliable responses. And VISAI is already looking ahead—with plans to post-train Typhoon on domain-specific tax law corpora to further enhance the chatbot’s accuracy and domain alignment.
You can try Somsi for yourself at somsi.visai.ai
Figure 4. Somsi answers a question about stamp duty
Figure 5. Sample tax law queries users can ask
VISAI’s Research That Strengthens Thai NLP
VISAI’s work with Typhoon doesn’t stop at chatbots. Behind the scenes, the team is deeply involved in research that pushes the boundaries of Thai natural language processing. One such project is NitiBench, a benchmark suite VISAI developed to evaluate AI performance on Thai legal tasks.
In their research workflow, VISAI uses Typhoon in two main ways:
-
First, as a baseline model to compare other language models against.
-
Second, as a platform for post-training, allowing the team to explore how well large language models can adapt to specialized legal content in Thai.
“Typhoon not only trains faster than other models—it consistently gives us stronger downstream results after post-training,” Pawitsapak explains.
Domain adaptation—especially in Thai legal language—isn’t simple. It requires careful dataset design, iterative tuning, and models that can learn nuanced, formal Thai. Through this work, VISAI has gained deeper insight into how Thai LLMs behave in complex tasks, what their limits are, and how they can be improved.
Looking Ahead: A Message to Thai Developers and AI Builders
As the global AI landscape continues to evolve, with powerful proprietary models emerging at a rapid pace, it's easy to feel like the future belongs to the giants. But VISAI sees things differently.
Even with all the advanced proprietary LLMs out there, there are still so many use cases—like on-premise deployments or domain-specific tasks such as Thai legal language—where having a model that’s built by Thais, for Thai, makes a real difference. Typhoon understands Thai context, isn’t overly large, and being open source means we can use it freely to build better solutions.
Pawitsapak Akarajaradwong, Senior Data Scientist at VISAI
This isn’t just about technology—it’s about ownership and relevance. Typhoon stands out not just because it performs well, but because it’s designed with local needs in mind. It hits the sweet spot: powerful enough for production use, small enough for flexible deployment, and culturally aligned enough to understand the nuances of Thai language and law.
For developers, researchers, and product owners working in Thailand, this opens new doors. You’re not just plugging into someone else’s ecosystem—you’re helping build your own.