We’re thrilled to announce Typhoon OCR 1.5, the latest evolution of our open-source vision-language document parsing model for Thai and English.
This new version delivers faster inference, stronger performance on handwritten and form-based documents, and improved adaptability across both text-rich and image-rich pages—all in a smaller, more efficient model.
The Broader Context
Traditional OCR systems, built on convolutional and sequence-based networks such as CNNs and RNNs, are well-suited for clean, simple layouts but often fail to capture the structure and semantics of real-world documents.
Frameworks like EasyOCR, PaddleOCR, and Tesseract support many languages—including Thai—but they still face common challenges:
-
Limited layout awareness – losing structural cues such as tables, headings, or columns
-
Weak image-text understanding – overlooking charts, figures, and mixed-media sections
-
Loss of context – processing content line by line without interpreting meaning or relationships
To address these limitations, **Vision-Language Models (VLMs) **offer a new paradigm.
By combining visual perception with language reasoning, VLM-based OCR systems can understand not just what text appears on the page, but how it fits into the larger document context.
The Typhoon OCR Journey
Typhoon OCR is an open-source, bilingual document parsing model built specifically for real-world documents in Thai and English.
Unlike conventional OCR tools, Typhoon OCR doesn't just extract raw text—it produces semantic, structured, and layout-preserving outputs that are optimized for downstream tasks such as:
-
Retrieval-Augmented Generation (RAG)
-
Comprehensive document parsing and understanding
-
Accurate interpretation of tables, charts, and forms
Since its initial release, it has been adopted by a wide range of users—from individual developers to multinational organizations—and helped spark broader interest in Thai AI vision models and document intelligence.
Now, with Typhoon OCR 1.5, we’re taking that foundation further.
This version brings major improvements in speed, efficiency, and real-world robustness, combining Typhoon’s hallmark layout awareness with better handwriting recognition, simplified inference, and faster performance across devices and workloads.
What’s New and Improved in Typhoon OCR 1.5
Typhoon OCR 1.5 isn’t just a smaller model—it’s a faster, more capable, and more adaptable system for real-world documents.
From form-heavy paperwork to handwritten notes and image-rich pages, version 1.5 brings a range of upgrades that make OCR smarter and more deployable than ever.
1. Compact Architecture, Faster Inference
Now powered by Qwen3-VL 2B, Typhoon OCR 1.5 is significantly smaller while maintaining strong multimodal intelligence.
Thanks to quantization and architectural optimization, it runs efficiently even on standard CPUs and edge hardware—ideal for local or privacy-first deployment.
The result: Faster inference and reduced resource usage, without compromising accuracy.
2. No More Metadata Dependency
Unlike v1, which used embedded PDF metadata to reconstruct layouts, v1.5 achieves layout fidelity directly from images.
This independence makes it far more versatile. It now works equally well with scanned PDFs, mobile captures, and legacy image archives.
It’s faster, simpler, and ready for production in any workflow.
3. Single-Prompt Inference
Typhoon OCR 1.5 simplifies the entire process. Instead of two separate prompts (system prompt and uesr prompt as used in v1), the model now operates through a single unified prompt, producing consistent results with less tuning.
For developers, that means quicker integration, fewer moving parts, and easier fine-tuning across document types.
4. Enhanced Handwriting and Form Understanding
Handwriting has always been one of the toughest challenges for OCR systems.
In v1.5, handwriting recognition and field detection have been significantly improved, producing more reliable text extraction and semantic alignment even on mixed cursive and printed text.
Whether processing government forms, receipts, or annotated notes, the new model handles irregular layouts with higher accuracy and stability.
5. Balanced Strength on Text-Rich and Image-Rich Documents
Whether you’re processing text-heavy reports or infographics filled with charts and figures, v1.5 adapts intelligently.
It maintains structural accuracy for financial tables, academic papers, and forms, while also generating meaningful text outputs from illustrated or image-dense documents.
Output Format
Typhoon OCR 1.5 continues to produce standardized, machine-friendly outputs ready for RAG systems, LLM pipelines, and structured databases.
-
Markdown – for general text
-
HTML – for tables, including merged and complex layouts
-
<figure>– for figures, charts, and diagramsExample:
<figure> A bar chart comparing domestic and export revenue growth between Q1 and Q2 2025. </figure> -
LaTeX – for mathematical equations
Example:
$$ \text{Profit Margin} = \frac{\text{Net Profit}}{\text{Total Revenue}} \times 100 $$ -
<page_number>– preserves page structureExample:
<page_number>1</page_number>
This unified output design ensures that developers can plug Typhoon OCR 1.5 directly into existing document-intelligence workflows.
Demo: Real-World Thai Documents OCR Results
Typhoon OCR 1.5 has been tested across diverse Thai and English document types—ranging from formal government forms to informal handwritten notes and visual materials.
Below are examples of its results across key real-world categories:
Infographics:
Excels in visual text understanding, maintaining layout fidelity even in mixed-language or image-heavy designs. v1.5 shows clearer segmentation and text-flow reconstruction than the previous version.

Handwritten Notes and Forms:
Demonstrates high consistency across varied handwriting styles and complex form structures, with better semantic grouping and field interpretation compared to v1.


Mathematical Content and Equations (new in v1.5):
Now supports LaTeX-style output for mathematical expressions and formulas—an entirely new capability introduced in this version.

Government Documents:
Performs high-accuracy full-page OCR, including consistent support for Thai numerals and official forms with complex layouts.

Financial Statements and Tables:
Handles dense tabular data, correctly identifying merged cells and headers while preserving the original layout.

Charts:
Converts visual chart content into human-readable Markdown or structured summaries, capturing both numeric data and contextual descriptions.

Letters and General Documents:
Accurately extracts text and structure from standard documents such as correspondence, memos, and administrative papers.

Buddhist-Style Thai–Pali Notes:
Handles traditional script combinations and interleaved Thai–Pali text with reliable character recognition and structure preservation.

Bills & Receipts and other documents:
Performs robustly even on out-of-domain content such as invoices, tickets, or utility bills.

Performance Evaluation
We benchmarked Typhoon OCR 1.5 against its predecessor (Typhoon OCR v1, 7B parameters) and leading proprietary systems (Gemini 2.5 Pro and GPT-5).
All tests were performed on Typhoon’s in-house Thai document dataset, covering financial reports, government forms, infographics, books, and handwritten documents using standard OCR and text-generation metrics: BLEU, ROUGE-L, and Levenshtein Distance.
- BLEU – Measures n-gram precision (↑ higher is better)

- ROUGE-L – Captures structural and sequence similarity (↑ higher is better)

- Levenshtein Distance – Character-level edit distance (↓ lower is better)

Overall Results
Despite being just 2 billion parameters—one-third the size of the first-generation Typhoon OCR 7B—version 1.5 delivers substantial performance gains across every metric, particularly on visually complex and handwritten materials.
**BLEU: **Average score improved from 0.558 (v1) to 0.644 (v1.5), showing stronger word- and phrase-level precision.
**ROUGE-L: **Average increased from 0.686 to 0.774, reflecting better structural and contextual alignment in the generated text.
**Levenshtein Distance: **Average dropped from 0.332 to 0.251 (lower = better), confirming fewer character-level errors and cleaner outputs.
Category Highlights
Thai Government Forms – v1.5 achieved top scores across all metrics (BLEU 0.870, ROUGE-L 0.967, Levenshtein 0.035), outperforming even Gemini 2.5 Pro and GPT-5.
Thai Books – Improved BLEU (0.746) and ROUGE-L (0.949) while cutting character errors by >60%, highlighting stronger understanding of long, structured text.
Handwritten Forms – BLEU jumped from 0.321 to 0.522 and ROUGE-L from 0.454 to 0.645, a major leap driven by the new handwriting and form-field enhancements.
Infographics & Visual Documents – BLEU increased from 0.246 to 0.408 and ROUGE-L from 0.373 to 0.527, showing clear progress in figure recognition and mixed-media parsing.
Financial Reports & Others – Maintained strong layout fidelity and semantic accuracy, remaining competitive with or ahead of larger proprietary models.
Efficiency and Cost Analysis
Beyond accuracy gains, Typhoon OCR 1.5 delivers substantial improvements in efficiency—making high-quality OCR more accessible for both developers and enterprises.
The new 2B architecture is optimized for real-world Thai and English documents and brings notable performance benefits:
| Metric | v1.5 Improvement |
|---|---|
| Throughput | 2–3× faster than v1 3B |
| Latency | Lower across all major GPUs (L4, A100, H100) |
| Cost Efficiency | 40–60% cheaper to run in the cloud |
| GPU Utilization | Up to 3× more pages per GPU-hour |
| Hardware Flexibility | Ability to Runs on smaller hardware |
These gains come from architectural simplification, quantization improvements, and the move to a single-prompt inference design, allowing v1.5 to deliver higher performance at significantly lower operational cost.
A full breakdown of throughput, latency, and hardware cost comparisons is available in the experimental results appendix.
Summary
Typhoon OCR is open-source, bilingual, and production-ready—a compact yet powerful model built for the next generation of document intelligence in Thailand and beyond.
Across nearly every task, Typhoon OCR 1.5 outperforms both its predecessor and global large-scale models on Thai document understanding—while being smaller, faster, and open-source.
Try Typhoon OCR Today
Experience Typhoon OCR v1.5 through our demos and open-source resources:
-
🔍 Test it instantly on our OCR Playground – just upload an image or a single-page PDF and see the results in seconds.
-
🤗 Hugging Face Models:
- Typhoon OCR 1.5 2B
💻 Colab Demo – Run Typhoon OCR 1.5 Demo.ipynb to test it in minutes.
- Typhoon OCR 1.5 2B
-
Ollama: Access Here
-
⚙️ API Access
To make it easier for existing users of Typhoon OCR v1 to migrate, we have introduced two endpoints for two versions of the model:
typhoon-ocr — the new default endpoint for the new Typhoon OCR 1.5
typhoon-ocr-preview — the endpoint for the previous Typhoon OCR v1, which will be deprecated on 31 December 2025
If you are already using Typhoon OCR v1 in your workflow, you can continue using typhoon-ocr-preview temporarily while transitioning to the new version.
Explore full API documentation for integration.


