TYPHOON Logo
TYPHOON
TYPHOON OCR

TYPHOON OCR

Task-specific
Qwen2.5-VL-Instruction
7B

Next-generation bilingual vision-language model for document parsing with superior Thai document understanding.

Released
May 19, 2025
Context
128k
Input
Image, PDF
Output
Text
TYPHOON OCR
About this Model

A next-generation, bilingual vision-language document parsing model built for real-word use cases, TYPHOON OCR delivers structured, layout-aware, and semantically rich outputs as well as outperforms both GPT-4o and Gemini 2.5 Flash in Thai document understanding, particularly on documents with complex layouts and mixed-language content.

Key Features
OCR Focus Model
Delivers high-accuracy performance on OCR and text extraction tasks.
Structured Document Handling
Supports complex formats like financial reports, academic papers, books, and government forms with layout-aware parsing.
Flexible Output Formats
Outputs general text in Markdown, tables in HTML (with support for merged cells and complex layouts), and visuals via figure tags.
Multi-layered Figure Interpretation
Analyzes figures through observation, context, text recognition (Thai/English), and visual design to generate rich, structured descriptions.
Informal Document Support
Handles layout-heavy inputs like receipts, menus, tickets, and infographics with Markdown outputs that preserve formatting.
Release History
Version 1
May 19, 2025
Initial release