Typhoon Logo
TYPHOON
Running Typhoon Models Locally with LM Studio: Complete Guide

Running Typhoon Models Locally with LM Studio: Complete Guide

Tutorial
LM Studio
Local Hosting
Typhoon Translate
Typhoon OCR

Learn how to run Typhoon AI models like Translate and OCR entirely offline with LM Studio. This step-by-step guide covers installation, system requirements, model formats (GGUF vs MLX), quantization, choosing the right model size, API usage for vision models, and best practices—so you can deploy powerful AI locally on your own machine.

Oravee (Orn) Smithiphol

Oravee (Orn) Smithiphol

July 09, 2025

Running Typhoon Models Locally with LM Studio: Complete Guide

Language Models are now becoming smaller, more efficient, and easier than ever to run privately on your own computer. LM Studio is a user-friendly tool that lets you load, customize, and chat with models available on Hugging Face entirely on your local machine—no coding, no cloud required.

This blog post aims to be a complete guide to installing LM Studio, choosing the right model format, and getting the most out of local AI with Typhoon models like Typhoon Translate 3B (for text translation), Typhoon OCR 3B (for vision-based text extraction) and Typhoon-2.1-Gemma (for general purpose).

What is LM Studio?

LM Studio is a free, cross-platform desktop app for running open-weight language models locally. It offers:

  • 🖥️ A simple ChatGPT-style interface
  • 📥 Built-in model discovery and download
  • 🔌 An OpenAI-compatible local API server
  • ⚙️ Advanced runtime settings (context length, GPU offloading, sampling parameters)
  • 🧩 Support for text-only and vision-enabled models (via GGUF)

System Requirements

MacOS

  • Chipset: Apple Silicon (M1, M2, M3, M4)
  • OS: macOS 13.4 or newer (14+ recommended)
  • RAM: 16 GB recommended (8 GB is enough for small models)

⚠️ Intel-based Macs are not supported.

Windows

  • CPU: AVX2 instruction set (x64)
  • RAM: At least 16 GB recommended

Linux

  • OS: Ubuntu 20.04 or newer (minor issues possible on 22+)
  • Architecture: x64 only

Getting Started with LM Studio

1️⃣ Download and Install

2️⃣ Download an LLM

LM Studio supports multiple file formats and quantized variants. Choose the right one based on your hardware and needs:

Model File Formats

FormatBest ForProsConsPlatforms
GGUFUniversal, most modelsCross-platform, supports visionSlightly larger on diskWindows, macOS (Intel & Apple Silicon), Linux
MLXApple Silicon onlyOptimized for M1/M2/M3 GPUsApple-only, text-only for nowmacOS (M1/M2/M3/M4)

GGUF is recommended for most users, especially for non-text models like Typhoon OCR 3B.
MLX is best on Apple Silicon for text-only models, with faster performance.

📥 How to Get a Model

  • Use LM Studio's Discover tab to search and download directly

    Example: Search for Typhoon Translate in LM Studio's Discover tab
  • Or download from external links (like Hugging Face)

You can browse models on Hugging Face. For example, Typhoon OCR is one of the Hugging Face collections of Typhoon models. You can find typhoon-ocr-7b and typhoon-ocr-3b in a standard format which you can use with Transformer library, Google Collab, or vLLM. You can also find our recent GGUF format for 3b model here: https://huggingface.co/scb10x/typhoon-ocr-3b-gguf this format is recommended when using with LM Studio.

At the model page, select 'Use this model' then select 'LM Studio'. If you don't see the option, make sure to enable it at https://huggingface.co/settings/local-apps#local-apps

  • Or you can download as a file and import your '.mdx' or '.gguf' files into LM Studio directly.

4️⃣ Set Your System Prompt

  • Go to My Models > click ⚙️ Settings next to your model
  • Under Prompt tab, set a custom system prompt

Example: You are a female AI assistant named Typhoon created by SCB 10X to be helpful, harmless, polite, and honest...

5️⃣ [Optional] Tweak Inference Settings

You can adjust parameters such as:

  • Context length
  • GPU offload
  • Batch size, threads
  • Sampling temperature/top-p

These settings help balance speed, memory use, and response quality.

Choosing the Right Typhoon Model Size

Typhoon models come in multiple sizes (e.g., 1B, 3B, 4B, 7B, 8B, 12B).
“B” = billions of parameters.
More parameters generally mean better reasoning and context handling—but also higher memory and compute requirements.

Choosing a model size

  • 1–3B models: Best for short translations, basic OCR, single-turn Q&A. Runs on 8–16 GB RAM systems.
  • 4B–8B models: Balance quality and resource use. Handle richer multilingual text, longer documents. Plan for 16–32 GB RAM and 6–12 GB GPU VRAM.
  • 12B models: Best for research-grade summarization, advanced multi-turn chats. Need 32 GB+ RAM or gamer GPU.

Best practice: Choose the smallest size that meets your quality needs.

📌 Available Typhoon Model Variants

  • Typhoon 2.1 Gemma: 4B, 12B
  • Typhoon 2: 1B, 3B, 8B
  • Typhoon OCR: 3B, 7B
  • Typhoon Translate: 4B

💬 Chat with Your LLM

For text models like Typhoon 2.1 Gemma or Typhoon Translate, LM Studio's chat interface is very simple:

  1. Go to the Chat tab
  2. Create a new chat
  3. Select the loaded model
  4. Type your prompt and get a response

Example:

I asked Typhoon Translate to help translate this section into Thai.

🖼️ Using Typhoon OCR 3B (Vision Model) via API

LM Studio's chat UI doesn't support image upload yet, but you can use Typhoon OCR 3B via its local OpenAI-compatible API.

⚠️ Important Update on Using Typhoon OCR 3B (Vision Model) Locally

Update on July 15, 2025

We’ve tested our current quantized (.gguf) version of the Typhoon OCR 3B model in LM Studio and found that accuracy is significantly reduced compared to the original model.

✅ Our recommendation:

  1. Avoid using the .gguf quantized model for OCR tasks for now. We’re working on improving the quantization and alignment for better OCR accuracy in future releases.

  2. If you need to run Typhoon OCR locally today.

We suggest using Ollama as your backend instead of LM Studio as it does not rely on the .gguf model that we're currently fixing.

In our tests, OCR accuracy was higher using the unquantized or better-optimized variants in Ollama. It also supports easy setup on macOS, Windows, and Linux.

Conclusion

LM Studio makes running advanced AI models locally easy, private, and efficient. Whether you want to translate text with Typhoon Translate, extract text from images with Typhoon OCR, or chat with other Typhoon variants, this guide should help you get up and running.

Appendix

The following instructions remain here for your future reference to use OCR API with LM Studio. Once we release an updated, better-quantized GGUF model with improved OCR accuracy, you can use LM Studio’s local API exactly as described.

For now, though, we do not recommend using this method for production or critical OCR tasks.

How to Use OCR API with LM Studio

  1. Load the Typhoon OCR 3B .gguf model in LM Studio

  2. LM Studio’s API will automatically be available at: http://localhost:1234/v1

  3. (First-time requirement) Install the OpenAI Python package: pip install openai

  4. Write a Python script to work with the API. You can send base64-encoded images as input via the API.

Here’s a minimal working example in Python:

PYTHON

LM Studio will process the image and return extracted text locally.