|

Qwen Open-Sources Advanced ASR And Forced Alignment Models With Multi-Language Capabilities

Qwen Open-Sources Advanced ASR And Forced Alignment Models With Multi-Language Capabilities
Qwen Open-Sources Advanced ASR And Forced Alignment Models With Multi-Language Capabilities

Alibaba Cloud introduced that it has made its Qwen3-ASR and Qwen3-ForcedAligner AI fashions open-source, providing superior instruments for speech recognition and compelled alignment. 

The Qwen3-ASR household contains two all-in-one fashions, Qwen3-ASR-1.7B and Qwen3-ASR-0.6B, which help language identification and transcription throughout 52 languages and accents, leveraging large-scale speech information and the Qwen3-Omni basis mannequin. 

Internal testing signifies that the 1.7B mannequin delivers state-of-the-art accuracy amongst open-source ASR techniques, whereas the 0.6B model balances efficiency and effectivity, able to transcribing 2,000 seconds of speech in a single second with high concurrency. 

The Qwen3-ForcedAligner-0.6B mannequin makes use of a non-autoregressive LLM strategy to align textual content and speech in 11 languages, outperforming main force-alignment options in each pace and accuracy. 

Alibaba Cloud has additionally launched a complete inference framework beneath the Apache 2.0 license, supporting streaming, batch processing, timestamp prediction, and fine-tuning, aimed toward accelerating analysis and sensible functions in audio understanding.

Qwen3-ASR And Qwen3-ForcedAligner Models Demonstrate Leading Accuracy And Efficiency

Alibaba Cloud has launched efficiency outcomes for its Qwen3-ASR and Qwen3-ForcedAligner fashions, demonstrating main accuracy and effectivity throughout various speech recognition duties. 

The Qwen3-ASR-1.7B mannequin achieves state-of-the-art outcomes amongst open-source techniques, outperforming industrial APIs and different open-source fashions in English, multilingual, and Chinese dialect recognition, together with Cantonese and 22 regional variants. 

It maintains dependable accuracy in difficult acoustic situations, reminiscent of low signal-to-noise environments, youngster or aged speech, and even singing voice transcription, attaining common phrase error charges of 13.91% in Chinese and 14.60% in English with background music.

The smaller Qwen3-ASR-0.6B balances accuracy and effectivity, delivering high throughput and low latency beneath high concurrency, able to transcribing as much as 5 hours of speech in on-line asynchronous mode at a concurrency of 128. 

Meanwhile, the Qwen3-ForcedAligner-0.6B outperforms main end-to-end compelled alignment fashions together with Nemo-Forced-Aligner, WhisperX, and Monotonic-Aligner, providing superior language protection, timestamp accuracy, and help for various speech and audio lengths.

The submit Qwen Open-Sources Advanced ASR And Forced Alignment Models With Multi-Language Capabilities appeared first on Metaverse Post.

Similar Posts