
Tutorial: Running Typhoon Locally with Ollama and Open WebUI
TutorialOllamaLocal Hosting

Table of Contents
Typhoon models are now officially available on Ollama, making it easier than ever to run powerful language models on your local machine. This guide will walk you through the process of setting up and running Typhoon models locally, without relying on cloud services.
Why Run LLMs Locally?
Key Benefits:
- Cost-Effective: Your laptop alone is likely sufficient for basic use cases and testing
- Complete Control: Full flexibility over model configuration and deployment
- Enhanced Privacy: All processing happens on your local machine
- Reliable Access: No dependency on internet connectivity or external services
Ideal Use Cases:
- Privacy-Focused Developers building sensitive applications
- Researchers requiring unlimited, consistent model access
- Organizations maintaining strict data sovereignty
- Students exploring LLM applications hands-on
Meet Typhoon's Local-Friendly Models
Typhoon offers several Thai-English bilingual models optimized for local deployment:
- Typhoon2-1b-instruct: Lightweight model with 1 billion parameters
- Typhoon2-3b-instruct: Small-sized model offering balanced performance
- Typhoon2-8b-instruct: Small-sized model with greater capabilities
- Typhoon2-t1-3b-research-preview: Specialized 3-billion parameter reasoning model
All these models are readily accessible through Ollama, which will be the focus platform for this tutorial. For those interested in exploring additional options, including multimodal capabilities, our complete model collection is also available on Hugging Face.
Recommended System Requirements - Choosing the Right Model
Which model size to choose? Running models locally depends on the size of the model and your system's hardware. Here's a simple guide to help you decide:
- Typhoon2-1b-instruct: Runs smoothly on systems with 8GB of RAM.
- Typhoon2-3b-instruct and Typhoon2-t1-3b-research-preview: Runs well on systems with 8GB-16GB of RAM.
- Typhoon2-8b-instruct: Requires 16GB+ RAM.
General Recommendations:
CPU:
- Newer processors like Intel 11th Gen or AMD Zen4 are ideal.
- Apple Silicon (M1, M2, M3, M4 series) for macOS users.
GPU:
- Optional for smaller models
- Recommended for 3b+ models
Storage:
- Minimum 50GB free space
A Quickstart Guide to Run Typhoon Locally with Ollama
What is Ollama?
Ollama is an open-source tool that simplifies running LLMs locally. It handles model management, optimization, and provides an easy-to-use interface for developers.
Installation
- Visit Ollama's download page
- Follow the installation instructions for your operating system
- Verify installation by opening a terminal and running
ollama --version
Running Your First Model
-
Browse available Typhoon models at ollama.com/scb10x
-
Choose your preferred model and run it using the command given on the page. For example, use this command for the 8b-instruct
ollama run scb10x/llama3.1-typhoon2-8b-instruct
The model will be downloaded (pulled) to your device on first use. Once complete, you can start interacting with it directly through your terminal.
In this example, I asked for an explanation of quantum computing tailored for newbies in Thai.
อธิบายเรื่อง quantum computing ให้เข้าใจง่ายๆ
Result:
Enhanced User Experience with Open WebUI
While the terminal interface works well for developers, you might prefer a more user-friendly interface like the chat interface shown below that most of us are very familiar with.
Open WebUI is an open-source extension that we can use to achieve benefits such as:
- Modern chat interface
- File upload capabilities
- Code interpretation
- Multi-model management
- User authentication
Setting Up Open WebUI
1. Install Open WebUI
Open your terminal and enter the following command:
pip install open-webui
Note: Make sure you have Python 3.11 installed before running this command.
2. Start the Server
Once installed, run the command below to start Open WebUI: open-webui serve
You’ll need to run this command every time you open a new terminal session to enable Open WebUI.
3. Access the Interface
Open your browser and go to: http://localhost:8080
The interface will prompt you to create an account the first time you run it.
4. Select a Model
Choose any Ollama model you’ve previously downloaded and start chatting! You’re now ready to use Typhoon models or other downloaded models.
5. Customize Your Setting (Optional)
To switch the interface between dark and light mode or adjust other preferences, simply select settings located at the top-right corner when you click on your account name.
Sample Use Cases
Typhoon’s small local models are versatile and excel in both English and Thai. Below are some sample use cases:
1. Question-Answering
Ask Typhoon a question, and the model will provide an answer in either Thai or English, matching the language of your question. Additionally, it can handle basic math and calculations to assist with numerical queries.
2. Translation
Typhoon can help translate text seamlessly between Thai and English. You can also specify your preferred tone for the translation.
3. Content Generation
Need help writing? Ask the model to generate anything from a social media caption to a professional email.
4. Text Summarization
Provide Typhoon with long text or upload a document, and it will summarize the content for you in a concise and easy-to-read format.
These are just a few examples of how Typhoon can be used by everyday users.
Tips: How to Effectively Use Small Models
To achieve optimal results from Typhoon’s small models, it’s important to craft well-structured prompts and adjust the sampling parameters depending on your task and preferences. These parameters can be found in the controls section at the top right corner of the Open WebUI interface.
Key Parameters Explained and Tips:
1. Temperature
Temperature controls the randomness of the model's response. Lower values make the output more focused and deterministic, while higher values increase diversity—this may introduce randomness but helps in generating more creative responses.
Recommended settings:
- 3b and 8b Models:
temperature < 0.7
- 1b Model:
temperature < 0.3
Task-specific advice:
- Creative content generation: Use a higher temperature (around 0.7) to encourage diversity in the output.
- Reasoning and fact-based tasks: Use a lower temperature (0.3-0.5) to ensure accuracy and minimize unnecessary variations.
2. Top-p (Nucleus Sampling)
Top-p also controls the randomness and diversity of the generated text. It sets a probability threshold, limiting the model to selecting from the most probable tokens. Lower values focus on high-probability words for coherence, while higher values allow the inclusion of less likely words to increase creativity.
Recommended setting:
top_p = 0.9
Quick comparison with temperature:
- Temperature focuses on randomness by "scaling" probabilities of possible outcomes.
- Top-p sets a limit on how many possible tokens will be considered based on probability, controlling "where" the model samples from.
3. Max Tokens
Max tokens determines the maximum length of the model’s response. Use a higher value for tasks that require detailed or lengthy outputs, such as summarization or reasoning tasks.
Suggested settings:
512
for general use
1024
or higher for complex reasoning or detailed summarization tasks
Task-specific tip: For reasoning-specific models (e.g., Typhoon2-T1), consider setting the max tokens higher to allow the model to process longer responses effectively.
By fine-tuning these parameters, you can tailor Typhoon’s small models to perform tasks more effectively, whether you're generating creative content, translating, or solving complex problems!
What’s Next?
Ollama's ecosystem offers endless opportunities. For instance, it integrates with tools like LangChain, allowing developers to build sophisticated agent-based systems that can reason, plan, and act autonomously. Additionally, you can leverage LlamaIndex to create powerful Retrieval-Augmented Generation (RAG) applications, making it straightforward to connect LLMs with your proprietary data for informed responses.
These are just a few examples of the possibilities, and we’ll explore these advanced use cases in greater detail in upcoming blog posts.
Join Our Community
We’re excited to see how developers, researchers, and businesses use Typhoon 2 to build applications. Join our Discord community to share your experiences, ask questions, and stay updated on new releases.
Get started today! 🚀