Blog LogoTYPHOON
Home
Research
Join Text-to-Speech Research
DocumentationDemo AppsPlayground
Blog
About
Blog LogoTyphoon
  • Home
  • Research
    • Join Text-to-Speech Research
  • Get Started
    • Documentation
    • Demo Apps
    • Playground
  • Blog
  • About

© 2025 SCB 10X Co., Ltd.

7 Things to Know About Typhoon-TTS and FAQs on Contributing Your Voice for Research

7 Things to Know About Typhoon-TTS and FAQs on Contributing Your Voice for Research

Typhoon-TTSResearch Preview
Oravee (Orn) Smithiphol
Oravee (Orn) Smithiphol
April 09, 2025

Table of Contents

1. Naturally expressive, not voice cloning2. New potential with voice merging technology3. Voice generation from text (Text-to-Speech)4. The challenge of Thai TTS5. We need much more Thai speech data6. Our goal: High-quality, open-source TTS for Thai7. Open-source Thai TTS needs collective effort

Many of you may already know that beyond Typhoon’s large language models for text, we also have a roadmap to significantly improve how we work with voice, which is another vital form of everyday communication.

Since the launch of Typhoon-Audio (Research Preview) in January, we’ve been working on furthering the research — and now we’re excited to share the latest progress through demo videos. This article aims to answer frequently asked questions, offer insights into our roadmap, and invite you to join our journey to build better Thai Voice AI together.

If you haven’t seen the demos yet, head over to voice.opentyphoon.ai or check out the two demo videos below.

1. Naturally expressive, not voice cloning

*Demo 1*

Q: How was the voice in the demo created? Is this voice cloning?

A: Some might assume we’re doing voice cloning — which is indeed possible today — but our research team deliberately chose not to use that approach. Instead, we focus on learning natural speaking styles. You’ll notice the generated voice resembles the original but isn’t an exact replica. That’s because we train the model on a single target voice, which leads to greater similarity. If we had blended voices from multiple people, the result would sound less distinct.

2. New potential with voice merging technology

*Demo 2*

Q: How does this demo work?

A: This research explores voice merging technology. In the demo, we blended the voice of an American English speaker with a Thai speaker to create a hybrid voice that reflects both identities.

Q: The demo allows for sliding adjustments — does that mean this tech will allow for flexible voice customization?

A: Exactly! Researchers can tune the balance between voices. Emphasizing the first (American) voice makes the result sound more like that original speaker. Shifting the weight to the Thai voice makes it sound more Thai. Setting the balance in the middle produces a more blended, hybrid voice.

Q: What’s next from this?

A: This shows the potential to combine and fine-tune different vocal traits — pitch, tone, accent — to create entirely new voices. Repeating this with different sources could lead to deployable, user-ready TTS voices for Thai in the future.

3. Voice generation from text (Text-to-Speech)

The voices in both demo videos were created using Text-to-Speech (TTS) technology — transforming plain text into audio.

Q: What is Text-to-Speech and why does it matter?

A: TTS lets computers “speak” written text out loud. You may have heard it in screen readers for the visually impaired or in automated voice responses in apps. TTS makes tech more accessible and versatile — useful in education, business, and everyday life.

4. The challenge of Thai TTS

Q: Why don’t Thai TTS systems sound as natural as English ones yet?

A: Thai is complex — with tones, unmarked word boundaries, and context-dependent pronunciations. This makes it harder to model well without a large and high-quality dataset, which we currently lack compared to other languages.

Q: How far along is the research, and how do you measure success?

A: These demos were trained with relatively few hours of voice data, yet still yielded impressive results. We also tested samples with varying amounts of training data and clearly saw that more data led to better results.

We measure progress using two main criteria:

  1. Correctness: Does the model pronounce everything accurately and as written?

  2. Naturalness: Does the voice sound fluid and human-like in rhythm and tone?

The first demo shows promising levels of both. But when we try out diverse text inputs that weren’t in the original training set, some errors still emerge.

5. We need much more Thai speech data

Q: How much data is needed?

A: As much as we can get! Our initial goal is 1,000 hours of Thai speech recordings.

That might sound like a lot, but it's really modest. For comparison:

  • Good-quality English TTS is trained with 40,000+ hours of data. Modern TTS models often train with hundreds of thousands of hours.

In Asia:

  • Indic Parler-TTS (India): ~8,000 hours

  • Malaysian-Emilia (Malaysia): ~15,000 hours

6. Our goal: High-quality, open-source TTS for Thai

Q: Will the final results of this research be kept internal?

A: No — our end goal is to make this open-source. That means anyone can use, build on, and benefit from it without license fees. This encourages innovation, especially for education, startups, and communities with limited resources.

Both India and Malaysia’s examples mentioned earlier are also open-source projects.

7. Open-source Thai TTS needs collective effort

Our biggest challenge right now is resources, especially data. Thai is still far behind English, as well as the open-source efforts from countries like India and Malaysia.

But this technology has the power to support developers and social initiatives — for instance, donating your voice to help create audio books for the visually impaired.

That’s why we’re inviting individuals to contribute their voice to help build a quality, open-source Thai TTS model. Our first target is 1,000 hours. You don’t need to be a researcher — this is a way for everyday people to directly contribute to Thailand’s AI future. Your voice will be one of many that help shape something new.

Spread the word! If you or someone you know has 10+ hours of recorded speech content, whether you're a content creator, tutor, or just someone with your own voice files — we’d love to partner with you.

It takes just 2–3 minutes to fill out this form. We’ll review your submission and get in touch.

Interested organizations wanting to explore TTS tech or join our research project can contact us at: contact[at]opentyphoon.ai

Let's work together to advance open-source Thai TTS and collaborative AI development effort!

Previous
Typhoon’s Papers Acceptances at ICLR 2025: Advancing Open Science for Low-Resource Language AI

Typhoon’s Papers Acceptances at ICLR 2025: Advancing Open Science for Low-Resource Language AI

Next

Unveiling How Siriraj Hospital’s SiData+ Uses Typhoon LLM for Smart Knowledge Management and Admin Support

Unveiling How Siriraj Hospital’s SiData+ Uses Typhoon LLM for Smart Knowledge Management and Admin Support

© 2025 SCB 10X Co., Ltd.. All rights reserved