Typhoon OCR

Task-specific

Qwen3-VL 2B

Next-generation bilingual vision-language model for document parsing with superior Thai document understanding.

Back to Models

Released

November 14, 2025

Context

128k

Input

Image, PDF

Output

Text

About this Model

A next-generation, bilingual vision-language document parsing model built for real-word use cases, Typhoon OCR delivers structured, layout-aware, and semantically rich outputs as well as outperforms both GPT-5 and Gemini 2.5 Flash in Thai document understanding, particularly on documents with complex layouts and mixed-language content.

Key Features

OCR Focus Model

Delivers high-accuracy performance on OCR and text extraction tasks.

Structured Document Handling

Supports complex formats like financial reports, academic papers, books, and government forms with layout-aware parsing.

Flexible Output Formats

Outputs general text in Markdown, tables in HTML (with support for merged cells and complex layouts), and visuals via figure tags.

Multi-layered Figure Interpretation

Analyzes figures through observation, context, text recognition (Thai/English), and visual design to generate rich, structured descriptions.

Informal Document Support

Handles layout-heavy inputs like receipts, menus, tickets, and infographics with Markdown outputs that preserve formatting.

Release History