MOSS-TTSD: Dialogue TTS in 6 Tests - Wiro AI

MOSS-TTSD turns a dialogue script into spoken conversation. This post runs 6 short tests and shares the raw audio outputs. The goal: check turn-taking, timing, and tone changes across speakers.

Model

openmoss/moss-ttsd

Test rules

Input format: a single dialogue string with speaker tags like [S1] and [S2]
No reference audio used in these runs
Outputs published as-is

Hero image

Podcast studio desk with microphones and an audio waveform — Prompt: Photorealistic podcast studio desk with two microphones and headphones. Soft warm lighting. A floating translucent audio waveform and subtitle lines in the background. Shallow depth of field.

Results (6 tests)

Test 1: office back and forth

Dialogue:

[S1] Morning. The numbers from yesterday look off. [S2] Yep. The export rounded decimals. [S1] Fix it and resend in ten minutes. [S2] On it.

Prompt: the dialogue above

Quick take: short turns sound clean. Speaker switches stay obvious.

Test 2: podcast intro pacing

Dialogue:

[S1] Welcome back to the show. Today: why latency matters. [S2] And why everyone notices bad timing. [S1] First question. What makes a voice feel real. [S2] Pauses, breaths, and turn taking.

Prompt: the dialogue above

Quick take: pauses help. The rhythm feels closer to conversation than a single long read.

Test 3: sports commentary energy

Dialogue:

[S1] Goal. Goal. Listen to the crowd. [S2] The pass was perfect. [S1] The striker did not hesitate. [S2] Replay it. Slow. The timing is everything.

Prompt: the dialogue above

Quick take: excitement shows up through tempo. This is useful for highlight narration.

Test 4: code switch lines

Dialogue:

[S1] Quick check. Are we live. [S2] Yes. Ses iyi mi. [S1] Great. Start with the headline. [S2] Tamam. Today the update ships at noon.

Prompt: the dialogue above

Quick take: mixed-language scripts are a good stress test. Pronunciation and cadence need spot checks.

Test 5: emotional tone shift

Dialogue:

[S1] I am sorry. I should have called. [S2] You left the room and never came back. [S1] I froze. I did not know what to say. [S2] Say it now. Slowly.

Prompt: the dialogue above

Quick take: the model handles quieter lines without turning everything monotone.

Test 6: production notes debate

Dialogue:

[S1] Step one. Read the script. [S2] Step two. Record clean takes. [S1] Step three. Cut the breaths. [S2] No. Keep some breaths. [S1] Fine. But remove the clicks. [S2] Deal.

Prompt: the dialogue above

Quick take: this kind of back-and-forth fits podcast and tutorial content.

Speed snapshot (task elapsed time)

Test	Elapsed (s)
1	72
2	87
3	82
4	76
5	51
6	67

Takeaways

Short turns help the model sound conversational.
Speaker changes stay clear when the script uses clean tags.
Code switching can work, but it needs listening checks for pronunciation.

Try it

Run MOSS-TTSD

Model

Test rules

Hero image

Results (6 tests)

Test 1: office back and forth

Test 2: podcast intro pacing

Test 3: sports commentary energy

Test 4: code switch lines

Test 5: emotional tone shift

Test 6: production notes debate

Speed snapshot (task elapsed time)

Takeaways

Try it

Leave a Comment Cancel reply

Related Posts

Qwen3.5-27B: 6 Quick Tests on Reasoning, Parsing, and Code

Seedance Pro V1.5: Text-to-Video in 5 Vertical Tests

Seedance V1 Pro Fast: Fast Text-to-Video in 5 Tests

Stay in the Loop