Tutorial: Running Typhoon Locally with Ollama and Open WebUI

Typhoon models are now officially available on Ollama, making it easier than ever to run powerful language models on your local machine. This guide will walk you through the process of setting up and running Typhoon models locally, without relying on cloud services.

Why Run LLMs Locally?

Key Benefits:

Cost-Effective: Your laptop alone is likely sufficient for basic use cases and testing
Complete Control: Full flexibility over model configuration and deployment
Enhanced Privacy: All processing happens on your local machine
Reliable Access: No dependency on internet connectivity or external services

Ideal Use Cases:

Privacy-Focused Developers building sensitive applications
Researchers requiring unlimited, consistent model access
Organizations maintaining strict data sovereignty
Students exploring LLM applications hands-on

Meet Typhoon's Local-Friendly Models

Typhoon offers several Thai-English bilingual models optimized for local deployment:

Typhoon2-1b-instruct: Lightweight model with 1 billion parameters
Typhoon2-3b-instruct: Small-sized model offering balanced performance
Typhoon2-8b-instruct: Small-sized model with greater capabilities
Typhoon2-t1-3b-research-preview: Specialized 3-billion parameter reasoning model

All these models are readily accessible through Ollama, which will be the focus platform for this tutorial. For those interested in exploring additional options, including multimodal capabilities, our complete model collection is also available on Hugging Face.

Recommended System Requirements - Choosing the Right Model

Which model size to choose? Running models locally depends on the size of the model and your system's hardware. Here's a simple guide to help you decide:

Typhoon2-1b-instruct: Runs smoothly on systems with 8GB of RAM.
Typhoon2-3b-instruct and Typhoon2-t1-3b-research-preview: Runs well on systems with 8GB-16GB of RAM.
Typhoon2-8b-instruct: Requires 16GB+ RAM.

General Recommendations:

CPU:

Newer processors like Intel 11th Gen or AMD Zen4 are ideal.
Apple Silicon (M1, M2, M3, M4 series) for macOS users.

GPU:

Optional for smaller models
Recommended for 3b+ models

Storage:

Minimum 50GB free space

A Quickstart Guide to Run Typhoon Locally with Ollama

What is Ollama?

Ollama is an open-source tool that simplifies running LLMs locally. It handles model management, optimization, and provides an easy-to-use interface for developers.

Installation

Visit Ollama's download page
Follow the installation instructions for your operating system
Verify installation by opening a terminal and running ollama --version

Running Your First Model

Browse available Typhoon models at ollama.com/scb10x
Choose your preferred model and run it using the command given on the page.
For example, use this command for the 8b-instruct ollama run scb10x/llama3.1-typhoon2-8b-instruct

The model will be downloaded (pulled) to your device on first use. Once complete, you can start interacting with it directly through your terminal.

In this example, I asked for an explanation of quantum computing tailored for newbies in Thai.
อธิบายเรื่อง quantum computing ให้เข้าใจง่ายๆ

Result:

Enhanced User Experience with Open WebUI

While the terminal interface works well for developers, you might prefer a more user-friendly interface like the chat interface shown below that most of us are very familiar with.

Open WebUI is an open-source extension that we can use to achieve benefits such as:

Modern chat interface
File upload capabilities
Code interpretation
Multi-model management
User authentication

Setting Up Open WebUI

1. Install Open WebUI

Open your terminal and enter the following command:
pip install open-webui

Note: Make sure you have Python 3.11 installed before running this command.

2. Start the Server

Once installed, run the command below to start Open WebUI: open-webui serve

You’ll need to run this command every time you open a new terminal session to enable Open WebUI.

3. Access the Interface

Open your browser and go to: http://localhost:8080

The interface will prompt you to create an account the first time you run it.

4. Select a Model

Choose any Ollama model you’ve previously downloaded and start chatting! You’re now ready to use Typhoon models or other downloaded models.

5. Customize Your Setting (Optional)

To switch the interface between dark and light mode or adjust other preferences, simply select settings located at the top-right corner when you click on your account name.

Sample Use Cases

Typhoon’s small local models are versatile and excel in both English and Thai. Below are some sample use cases:

1. Question-Answering

Ask Typhoon a question, and the model will provide an answer in either Thai or English, matching the language of your question. Additionally, it can handle basic math and calculations to assist with numerical queries.

2. Translation

Typhoon can help translate text seamlessly between Thai and English. You can also specify your preferred tone for the translation.

3. Content Generation

Need help writing? Ask the model to generate anything from a social media caption to a professional email.

4. Text Summarization

Provide Typhoon with long text or upload a document, and it will summarize the content for you in a concise and easy-to-read format.

These are just a few examples of how Typhoon can be used by everyday users.

Tips: How to Effectively Use Small Models

To achieve optimal results from Typhoon’s small models, it’s important to craft well-structured prompts and adjust the sampling parameters depending on your task and preferences. These parameters can be found in the controls section at the top right corner of the Open WebUI interface.

Key Parameters Explained and Tips:

1. Temperature

Temperature controls the randomness of the model's response. Lower values make the output more focused and deterministic, while higher values increase diversity—this may introduce randomness but helps in generating more creative responses.

Recommended settings:

3b and 8b Models: temperature < 0.7
1b Model: temperature < 0.3

Task-specific advice:

Creative content generation: Use a higher temperature (around 0.7) to encourage diversity in the output.
Reasoning and fact-based tasks: Use a lower temperature (0.3-0.5) to ensure accuracy and minimize unnecessary variations.

2. Top-p (Nucleus Sampling)

Top-p also controls the randomness and diversity of the generated text. It sets a probability threshold, limiting the model to selecting from the most probable tokens. Lower values focus on high-probability words for coherence, while higher values allow the inclusion of less likely words to increase creativity.

Recommended setting:
top_p = 0.9

Quick comparison with temperature:

Temperature focuses on randomness by "scaling" probabilities of possible outcomes.
Top-p sets a limit on how many possible tokens will be considered based on probability, controlling "where" the model samples from.

3. Max Tokens

Max tokens determines the maximum length of the model’s response. Use a higher value for tasks that require detailed or lengthy outputs, such as summarization or reasoning tasks.

Suggested settings:

512 for general use

1024 or higher for complex reasoning or detailed summarization tasks

Task-specific tip:
For reasoning-specific models (e.g., Typhoon2-T1), consider setting the max tokens higher to allow the model to process longer responses effectively.

By fine-tuning these parameters, you can tailor Typhoon’s small models to perform tasks more effectively, whether you're generating creative content, translating, or solving complex problems!

What’s Next?

Ollama's ecosystem offers endless opportunities. For instance, it integrates with tools like LangChain, allowing developers to build sophisticated agent-based systems that can reason, plan, and act autonomously. Additionally, you can leverage LlamaIndex to create powerful Retrieval-Augmented Generation (RAG) applications, making it straightforward to connect LLMs with your proprietary data for informed responses.

These are just a few examples of the possibilities, and we’ll explore these advanced use cases in greater detail in upcoming blog posts.

Join Our Community

We’re excited to see how developers, researchers, and businesses use Typhoon 2 to build applications. Join our Discord community to share your experiences, ask questions, and stay updated on new releases.

Get started today! 🚀