Typhoon 2 Audio

General

Llama 3.1

End-to-end multimodal model for speech processing with parallel text and audio generation capabilities.

Back to Models

Released

January 9, 2025

Context

Up to 8K tokens

Input

Audio

Output

Audio, Text

About this Model

[Research Preview] An end-to-end model that processes and generates both text and audio. It performs well on speech-centric tasks like transcription, audio captioning, and speech-to-speech translation, offering robust multi-turn dialogue support and text-to-speech capabilities.

Key Features

Parallel Text and Audio Outputs

Generates text and audio simultaneously, reducing latency and enhancing efficiency.

Extended Context Windows

Handles audio inputs of up to 30 seconds, enabling more detailed and comprehensive speech analysis.

Enhanced Instruction-Following

Supports multi-turn conversations, system prompts, and complex commands with improved accuracy.

Speech-to-Speech Processing

Delivers accurate transcription, seamless translation, and conversational audio generation.

Release History

Version 1

January 9, 2025

Initial release

Availability

Web Playground

Not available

Typhoon API

Not available

Typhoon API Pro

Not available

Hugging Face

Research preview model access

Other Platforms

Not available