Model Reviews

MOSS-TTSD: Dialogue TTS in 6 Tests

MOSS-TTSD: Dialogue TTS in 6 Tests

MOSS-TTSD turns a dialogue script into spoken conversation. This post runs 6 short tests and shares the raw audio outputs. The goal: check turn-taking, timing, and tone changes across speakers.

Model

Test rules

  • Input format: a single dialogue string with speaker tags like [S1] and [S2]
  • No reference audio used in these runs
  • Outputs published as-is

Hero image

Podcast studio desk with microphones and an audio waveform
Prompt: Photorealistic podcast studio desk with two microphones and headphones. Soft warm lighting. A floating translucent audio waveform and subtitle lines in the background. Shallow depth of field.

Results (6 tests)

Test 1: office back and forth

Dialogue:

[S1] Morning. The numbers from yesterday look off. [S2] Yep. The export rounded decimals. [S1] Fix it and resend in ten minutes. [S2] On it.

Prompt: the dialogue above

Quick take: short turns sound clean. Speaker switches stay obvious.

Test 2: podcast intro pacing

Dialogue:

[S1] Welcome back to the show. Today: why latency matters. [S2] And why everyone notices bad timing. [S1] First question. What makes a voice feel real. [S2] Pauses, breaths, and turn taking.

Prompt: the dialogue above

Quick take: pauses help. The rhythm feels closer to conversation than a single long read.

Test 3: sports commentary energy

Dialogue:

[S1] Goal. Goal. Listen to the crowd. [S2] The pass was perfect. [S1] The striker did not hesitate. [S2] Replay it. Slow. The timing is everything.

Prompt: the dialogue above

Quick take: excitement shows up through tempo. This is useful for highlight narration.

Test 4: code switch lines

Dialogue:

[S1] Quick check. Are we live. [S2] Yes. Ses iyi mi. [S1] Great. Start with the headline. [S2] Tamam. Today the update ships at noon.

Prompt: the dialogue above

Quick take: mixed-language scripts are a good stress test. Pronunciation and cadence need spot checks.

Test 5: emotional tone shift

Dialogue:

[S1] I am sorry. I should have called. [S2] You left the room and never came back. [S1] I froze. I did not know what to say. [S2] Say it now. Slowly.

Prompt: the dialogue above

Quick take: the model handles quieter lines without turning everything monotone.

Test 6: production notes debate

Dialogue:

[S1] Step one. Read the script. [S2] Step two. Record clean takes. [S1] Step three. Cut the breaths. [S2] No. Keep some breaths. [S1] Fine. But remove the clicks. [S2] Deal.

Prompt: the dialogue above

Quick take: this kind of back-and-forth fits podcast and tutorial content.

Speed snapshot (task elapsed time)

Test Elapsed (s)
1 72
2 87
3 82
4 76
5 51
6 67

Takeaways

  • Short turns help the model sound conversational.
  • Speaker changes stay clear when the script uses clean tags.
  • Code switching can work, but it needs listening checks for pronunciation.

Try it

Run MOSS-TTSD


Leave a Comment

Your email address will not be published. Required fields are marked *