Model Roundups

Speech-to-Text APIs in 2026: One Audio Clip, Two Modern Transcribers

Speech-to-Text APIs in 2026: One Audio Clip, Two Modern Transcribers

Speech-to-Text APIs in 2026: One Audio Clip, Two Modern Transcribers

This post tests two current speech-to-text APIs on Wiro using the same short MP3. The clip includes numbers and model names to stress common failure points.

Audio sample

Expected transcript (what the speaker says)

Hi. This is a 2026 speech to text benchmark on Wiro. It includes numbers like 3.5, 720p, and 1,024. Proper nouns: Kling, Seedance, PixVerse, Hailuo. End.

Models tested

Results

qwen/qwen3-asr-1.7b

Elapsed processing time: 45s.

Language: English
Text: Hi, this is a 20th round six-page-to-text benchmark on Weiro. It includes numbers like 3.5, 720p, and 1024, proper nouns, hilling, students, pigs verse, hailuo, and.

elevenlabs/speech-to-text

Elapsed processing time: 4s.

Hi, this is a 20th drawn six speech-to-text benchmark on Weiro. It includes numbers like 3.5, 720p, and 1024; proper nouns, hyelin; sedents, pixvers, hyluo; end.

Quick comparison table

Model Elapsed seconds What to watch for
Qwen3 ASR 1.7B 45 Numbers and punctuation vs. proper nouns
ElevenLabs STT 4 Speed vs. name accuracy

Try the models


Leave a Comment

Your email address will not be published. Required fields are marked *