Typhoon Logo
TYPHOON
Introducing Typhoon OCR 1.5: A Smaller, More Robust, and Faster Vision-Language OCR for Real-World Thai and English Documents

Introducing Typhoon OCR 1.5: A Smaller, More Robust, and Faster Vision-Language OCR for Real-World Thai and English Documents

New Release
Typhoon OCR
OCR
Vision Language

Typhoon OCR demonstrates state-of-the-art performance in Thai document parsing, surpassing larger commercial models in both accuracy and structural fidelity.

Surapon Nonesung

Surapon Nonesung

November 14, 2025

Introducing Typhoon OCR 1.5: A Smaller, More Robust, and Faster Vision-Language OCR for Real-World Thai and English Documents

We’re thrilled to announce Typhoon OCR 1.5, the latest evolution of our open-source vision-language document parsing model for Thai and English.
This new version delivers faster inference, stronger performance on handwritten and form-based documents, and improved adaptability across both text-rich and image-rich pages—all in a smaller, more efficient model.

The Broader Context

Traditional OCR systems, built on convolutional and sequence-based networks such as CNNs and RNNs, are well-suited for clean, simple layouts but often fail to capture the structure and semantics of real-world documents.
Frameworks like EasyOCR, PaddleOCR, and Tesseract support many languages—including Thai—but they still face common challenges:

  • Limited layout awareness – losing structural cues such as tables, headings, or columns

  • Weak image-text understanding – overlooking charts, figures, and mixed-media sections

  • Loss of context – processing content line by line without interpreting meaning or relationships

To address these limitations, **Vision-Language Models (VLMs) **offer a new paradigm.
By combining visual perception with language reasoning, VLM-based OCR systems can understand not just what text appears on the page, but how it fits into the larger document context.

The Typhoon OCR Journey

Typhoon OCR is an open-source, bilingual document parsing model built specifically for real-world documents in Thai and English.
Unlike conventional OCR tools, Typhoon OCR doesn't just extract raw text—it produces semantic, structured, and layout-preserving outputs that are optimized for downstream tasks such as:

  • Retrieval-Augmented Generation (RAG)

  • Comprehensive document parsing and understanding

  • Accurate interpretation of tables, charts, and forms

Since its initial release, it has been adopted by a wide range of users—from individual developers to multinational organizations—and helped spark broader interest in Thai AI vision models and document intelligence.

Now, with Typhoon OCR 1.5, we’re taking that foundation further.
This version brings major improvements in speed, efficiency, and real-world robustness, combining Typhoon’s hallmark layout awareness with better handwriting recognition, simplified inference, and faster performance across devices and workloads.

What’s New and Improved in Typhoon OCR 1.5

Typhoon OCR 1.5 isn’t just a smaller model—it’s a faster, more capable, and more adaptable system for real-world documents.

From form-heavy paperwork to handwritten notes and image-rich pages, version 1.5 brings a range of upgrades that make OCR smarter and more deployable than ever.

1. Compact Architecture, Faster Inference

Now powered by Qwen3-VL 2B, Typhoon OCR 1.5 is significantly smaller while maintaining strong multimodal intelligence.

Thanks to quantization and architectural optimization, it runs efficiently even on standard CPUs and edge hardware—ideal for local or privacy-first deployment.

The result: Faster inference and reduced resource usage, without compromising accuracy.

2. No More Metadata Dependency

Unlike v1, which used embedded PDF metadata to reconstruct layouts, v1.5 achieves layout fidelity directly from images.

This independence makes it far more versatile. It now works equally well with scanned PDFs, mobile captures, and legacy image archives.

It’s faster, simpler, and ready for production in any workflow.

3. Single-Prompt Inference

Typhoon OCR 1.5 simplifies the entire process. Instead of two separate prompts (system prompt and uesr prompt as used in v1), the model now operates through a single unified prompt, producing consistent results with less tuning.

For developers, that means quicker integration, fewer moving parts, and easier fine-tuning across document types.

4. Enhanced Handwriting and Form Understanding

Handwriting has always been one of the toughest challenges for OCR systems.
In v1.5, handwriting recognition and field detection have been significantly improved, producing more reliable text extraction and semantic alignment even on mixed cursive and printed text.

Whether processing government forms, receipts, or annotated notes, the new model handles irregular layouts with higher accuracy and stability.

5. Balanced Strength on Text-Rich and Image-Rich Documents

Whether you’re processing text-heavy reports or infographics filled with charts and figures, v1.5 adapts intelligently.
It maintains structural accuracy for financial tables, academic papers, and forms, while also generating meaningful text outputs from illustrated or image-dense documents.

Output Format

Typhoon OCR 1.5 continues to produce standardized, machine-friendly outputs ready for RAG systems, LLM pipelines, and structured databases.

  • Markdown – for general text

  • HTML – for tables, including merged and complex layouts

  • <figure> – for figures, charts, and diagrams

    Example:
    <figure> A bar chart comparing domestic and export revenue growth between Q1 and Q2 2025. </figure>

  • LaTeX – for mathematical equations

    Example: $$ \text{Profit Margin} = \frac{\text{Net Profit}}{\text{Total Revenue}} \times 100 $$

  • <page_number> – preserves page structure

    Example: <page_number>1</page_number>

This unified output design ensures that developers can plug Typhoon OCR 1.5 directly into existing document-intelligence workflows.

Demo: Real-World Thai Documents OCR Results

Typhoon OCR 1.5 has been tested across diverse Thai and English document types—ranging from formal government forms to informal handwritten notes and visual materials.
Below are examples of its results across key real-world categories:

Infographics:

Excels in visual text understanding, maintaining layout fidelity even in mixed-language or image-heavy designs. v1.5 shows clearer segmentation and text-flow reconstruction than the previous version.

OCR Thai Infographic

Handwritten Notes and Forms:

Demonstrates high consistency across varied handwriting styles and complex form structures, with better semantic grouping and field interpretation compared to v1.

OCR Thai Handwriting 1
OCR Thai Handwriting 2

Mathematical Content and Equations (new in v1.5):

Now supports LaTeX-style output for mathematical expressions and formulas—an entirely new capability introduced in this version.

OCR Math

Government Documents:

Performs high-accuracy full-page OCR, including consistent support for Thai numerals and official forms with complex layouts.

OCR Thai Government Documents

Financial Statements and Tables:

Handles dense tabular data, correctly identifying merged cells and headers while preserving the original layout.

Financial Statement Tabular Information Extraction Demo

Charts:

Converts visual chart content into human-readable Markdown or structured summaries, capturing both numeric data and contextual descriptions.

OCR Chart

Letters and General Documents:

Accurately extracts text and structure from standard documents such as correspondence, memos, and administrative papers.

OCR Thai Document Letter

Buddhist-Style Thai–Pali Notes:

Handles traditional script combinations and interleaved Thai–Pali text with reliable character recognition and structure preservation.

OCR Thai Buddhist Pali

Bills & Receipts and other documents:

Performs robustly even on out-of-domain content such as invoices, tickets, or utility bills.

Typhoon OCR bills

Performance Evaluation

We benchmarked Typhoon OCR 1.5 against its predecessor (Typhoon OCR v1, 7B parameters) and leading proprietary systems (Gemini 2.5 Pro and GPT-5).

All tests were performed on Typhoon’s in-house Thai document dataset, covering financial reports, government forms, infographics, books, and handwritten documents using standard OCR and text-generation metrics: BLEU, ROUGE-L, and Levenshtein Distance.

  • BLEU – Measures n-gram precision (↑ higher is better)
Typhoon OCR 1.5 BLEU Eval
  • ROUGE-L – Captures structural and sequence similarity (↑ higher is better)
Typhoon OCR 1.5 ROUGE-L Eval
  • Levenshtein Distance – Character-level edit distance (↓ lower is better)
Typhoon OCR 1.5 Levenshtein Eval

Overall Results

Despite being just 2 billion parameters—one-third the size of the first-generation Typhoon OCR 7B—version 1.5 delivers substantial performance gains across every metric, particularly on visually complex and handwritten materials.

**BLEU: **Average score improved from 0.558 (v1) to 0.644 (v1.5), showing stronger word- and phrase-level precision.

**ROUGE-L: **Average increased from 0.686 to 0.774, reflecting better structural and contextual alignment in the generated text.

**Levenshtein Distance: **Average dropped from 0.332 to 0.251 (lower = better), confirming fewer character-level errors and cleaner outputs.

Category Highlights

Thai Government Forms – v1.5 achieved top scores across all metrics (BLEU 0.870, ROUGE-L 0.967, Levenshtein 0.035), outperforming even Gemini 2.5 Pro and GPT-5.

Thai Books – Improved BLEU (0.746) and ROUGE-L (0.949) while cutting character errors by >60%, highlighting stronger understanding of long, structured text.

Handwritten Forms – BLEU jumped from 0.321 to 0.522 and ROUGE-L from 0.454 to 0.645, a major leap driven by the new handwriting and form-field enhancements.

Infographics & Visual Documents – BLEU increased from 0.246 to 0.408 and ROUGE-L from 0.373 to 0.527, showing clear progress in figure recognition and mixed-media parsing.

Financial Reports & Others – Maintained strong layout fidelity and semantic accuracy, remaining competitive with or ahead of larger proprietary models.

Efficiency and Cost Analysis

Beyond accuracy gains, Typhoon OCR 1.5 delivers substantial improvements in efficiency—making high-quality OCR more accessible for both developers and enterprises.
The new 2B architecture is optimized for real-world Thai and English documents and brings notable performance benefits:

Metricv1.5 Improvement
Throughput2–3× faster than v1 3B
LatencyLower across all major GPUs (L4, A100, H100)
Cost Efficiency40–60% cheaper to run in the cloud
GPU UtilizationUp to 3× more pages per GPU-hour
Hardware FlexibilityAbility to Runs on smaller hardware

These gains come from architectural simplification, quantization improvements, and the move to a single-prompt inference design, allowing v1.5 to deliver higher performance at significantly lower operational cost.

A full breakdown of throughput, latency, and hardware cost comparisons is available in the experimental results appendix.

Summary

Typhoon OCR is open-source, bilingual, and production-ready—a compact yet powerful model built for the next generation of document intelligence in Thailand and beyond.

Across nearly every task, Typhoon OCR 1.5 outperforms both its predecessor and global large-scale models on Thai document understanding—while being smaller, faster, and open-source.

Try Typhoon OCR Today

Experience Typhoon OCR v1.5 through our demos and open-source resources:

To make it easier for existing users of Typhoon OCR v1 to migrate, we have introduced two endpoints for two versions of the model:

typhoon-ocr — the new default endpoint for the new Typhoon OCR 1.5

typhoon-ocr-preview — the endpoint for the previous Typhoon OCR v1, which will be deprecated on 31 December 2025

If you are already using Typhoon OCR v1 in your workflow, you can continue using typhoon-ocr-preview temporarily while transitioning to the new version.

Explore full API documentation for integration.