Model Reviews

VibeVoice Realtime: Real-time TTS in 6 Tests

VibeVoice Realtime: Real-time TTS in 6 Tests

VibeVoice Realtime: real-time TTS in 6 tests

VibeVoice Realtime is a text-to-speech model that targets low-latency voice output and long-form stability. This post runs six short but practical prompts and publishes the raw MP3 outputs.

Model link

What was tested

  • Short ad read (timing, emphasis)
  • DevOps style numbers and acronyms (UTC, v2, HTTP)
  • Longer checklist paragraph (rhythm and breath)
  • Meeting recap (prosody across sentences)
  • German output with a German voice
  • Tongue twisters (hard articulation)

Inputs used

The runs used only three inputs from the model docs: prompt, speakerName, and scale.

Run-time snapshot

Test Speaker Elapsed seconds
01 en-emma_woman 8
02 en-davis_man 10
03 en-grace_woman 13
04 en-carter_man 16
05 de-spk1_woman 10
06 en-mike_man 31

Results: 6 prompts with audio

Test 01: short product ad read

Prompt:

New drop. Stainless steel watch, matte black dial, 10 percent off today. Free shipping, delivery in 2 to 3 business days.

Test 02: numbers, acronyms, and ops language

Prompt:

Deploy v2 at 14:05 UTC. Roll back if error rate exceeds 0.7 percent. Log the request id, the JSON payload size, and the HTTP status code.

Test 03: checklist pacing

Prompt:

Onboarding checklist. Step one, verify email. Step two, create an API key. Step three, run a smoke test with two prompts. Step four, set timeouts and retries. Step five, ship.

Test 04: sentence-level prosody

Prompt:

Meeting recap. First, the team agreed to cut the scope. Next, a quick demo shipped with a single button. Finally, a bug fix went out before lunch. Action items follow.

Test 05: German voice

Prompt:

Achtung. Bitte lesen Sie die Anleitung. Seriennummer DE 77 2048. Garantie 24 Monate. Bei Fragen, schreiben Sie dem Support.

Test 06: hard articulation

Prompt:

Hard test. She sells seashells by the seashore. Red leather, yellow leather. Unique New York. Say it three times, clearly.

Honest take

  • The voice stays clear on short prompts. The cadence sounds steady.
  • Ops text works well when punctuation is explicit (commas and periods). Without it, acronyms can blur.
  • Speaker choice matters more than scale for the perceived style. Testing a few voices before shipping pays off.

Try it

Run VibeVoice Realtime on Wiro


Leave a Comment

Your email address will not be published. Required fields are marked *