{"id":1638,"date":"2026-03-24T07:51:39","date_gmt":"2026-03-24T07:51:39","guid":{"rendered":"https:\/\/wiro.ai\/blog\/?p=1638"},"modified":"2026-03-08T07:52:37","modified_gmt":"2026-03-08T07:52:37","slug":"moss-ttsd-dialogue-tts-in-6-tests","status":"publish","type":"post","link":"https:\/\/wiro.ai\/blog\/moss-ttsd-dialogue-tts-in-6-tests\/","title":{"rendered":"MOSS-TTSD: Dialogue TTS in 6 Tests"},"content":{"rendered":"<p>MOSS-TTSD turns a dialogue script into spoken conversation. This post runs 6 short tests and shares the raw audio outputs. The goal: check turn-taking, timing, and tone changes across speakers.<\/p>\n<h2>Model<\/h2>\n<ul>\n<li><a href=\"https:\/\/wiro.ai\/models\/openmoss\/moss-ttsd\">openmoss\/moss-ttsd<\/a><\/li>\n<\/ul>\n<h2>Test rules<\/h2>\n<ul>\n<li>Input format: a single dialogue string with speaker tags like [S1] and [S2]<\/li>\n<li>No reference audio used in these runs<\/li>\n<li>Outputs published as-is<\/li>\n<\/ul>\n<h2>Hero image<\/h2>\n<figure>\n  <img decoding=\"async\" src=\"https:\/\/wiro.ai\/blog\/wp-content\/uploads\/2026\/03\/moss-ttsd-hero.jpg\" alt=\"Podcast studio desk with microphones and an audio waveform\" \/><figcaption>Prompt: Photorealistic podcast studio desk with two microphones and headphones. Soft warm lighting. A floating translucent audio waveform and subtitle lines in the background. Shallow depth of field.<\/figcaption><\/figure>\n<h2>Results (6 tests)<\/h2>\n<h3>Test 1: office back and forth<\/h3>\n<p>Dialogue:<\/p>\n<p>[S1] Morning. The numbers from yesterday look off. [S2] Yep. The export rounded decimals. [S1] Fix it and resend in ten minutes. [S2] On it.<\/p>\n<figure>\n  <audio controls src=\"https:\/\/wiro.ai\/blog\/wp-content\/uploads\/2026\/03\/moss-ttsd-test-1.mp3\"><\/audio><figcaption>Prompt: the dialogue above<\/figcaption><\/figure>\n<p>Quick take: short turns sound clean. Speaker switches stay obvious.<\/p>\n<h3>Test 2: podcast intro pacing<\/h3>\n<p>Dialogue:<\/p>\n<p>[S1] Welcome back to the show. Today: why latency matters. [S2] And why everyone notices bad timing. [S1] First question. What makes a voice feel real. [S2] Pauses, breaths, and turn taking.<\/p>\n<figure>\n  <audio controls src=\"https:\/\/wiro.ai\/blog\/wp-content\/uploads\/2026\/03\/moss-ttsd-test-2.mp3\"><\/audio><figcaption>Prompt: the dialogue above<\/figcaption><\/figure>\n<p>Quick take: pauses help. The rhythm feels closer to conversation than a single long read.<\/p>\n<h3>Test 3: sports commentary energy<\/h3>\n<p>Dialogue:<\/p>\n<p>[S1] Goal. Goal. Listen to the crowd. [S2] The pass was perfect. [S1] The striker did not hesitate. [S2] Replay it. Slow. The timing is everything.<\/p>\n<figure>\n  <audio controls src=\"https:\/\/wiro.ai\/blog\/wp-content\/uploads\/2026\/03\/moss-ttsd-test-3.mp3\"><\/audio><figcaption>Prompt: the dialogue above<\/figcaption><\/figure>\n<p>Quick take: excitement shows up through tempo. This is useful for highlight narration.<\/p>\n<h3>Test 4: code switch lines<\/h3>\n<p>Dialogue:<\/p>\n<p>[S1] Quick check. Are we live. [S2] Yes. Ses iyi mi. [S1] Great. Start with the headline. [S2] Tamam. Today the update ships at noon.<\/p>\n<figure>\n  <audio controls src=\"https:\/\/wiro.ai\/blog\/wp-content\/uploads\/2026\/03\/moss-ttsd-test-4.mp3\"><\/audio><figcaption>Prompt: the dialogue above<\/figcaption><\/figure>\n<p>Quick take: mixed-language scripts are a good stress test. Pronunciation and cadence need spot checks.<\/p>\n<h3>Test 5: emotional tone shift<\/h3>\n<p>Dialogue:<\/p>\n<p>[S1] I am sorry. I should have called. [S2] You left the room and never came back. [S1] I froze. I did not know what to say. [S2] Say it now. Slowly.<\/p>\n<figure>\n  <audio controls src=\"https:\/\/wiro.ai\/blog\/wp-content\/uploads\/2026\/03\/moss-ttsd-test-5.mp3\"><\/audio><figcaption>Prompt: the dialogue above<\/figcaption><\/figure>\n<p>Quick take: the model handles quieter lines without turning everything monotone.<\/p>\n<h3>Test 6: production notes debate<\/h3>\n<p>Dialogue:<\/p>\n<p>[S1] Step one. Read the script. [S2] Step two. Record clean takes. [S1] Step three. Cut the breaths. [S2] No. Keep some breaths. [S1] Fine. But remove the clicks. [S2] Deal.<\/p>\n<figure>\n  <audio controls src=\"https:\/\/wiro.ai\/blog\/wp-content\/uploads\/2026\/03\/moss-ttsd-test-6.mp3\"><\/audio><figcaption>Prompt: the dialogue above<\/figcaption><\/figure>\n<p>Quick take: this kind of back-and-forth fits podcast and tutorial content.<\/p>\n<h2>Speed snapshot (task elapsed time)<\/h2>\n<table>\n<tr>\n<th>Test<\/th>\n<th>Elapsed (s)<\/th>\n<\/tr>\n<tr>\n<td>1<\/td>\n<td>72<\/td>\n<\/tr>\n<tr>\n<td>2<\/td>\n<td>87<\/td>\n<\/tr>\n<tr>\n<td>3<\/td>\n<td>82<\/td>\n<\/tr>\n<tr>\n<td>4<\/td>\n<td>76<\/td>\n<\/tr>\n<tr>\n<td>5<\/td>\n<td>51<\/td>\n<\/tr>\n<tr>\n<td>6<\/td>\n<td>67<\/td>\n<\/tr>\n<\/table>\n<h2>Takeaways<\/h2>\n<ul>\n<li>Short turns help the model sound conversational.<\/li>\n<li>Speaker changes stay clear when the script uses clean tags.<\/li>\n<li>Code switching can work, but it needs listening checks for pronunciation.<\/li>\n<\/ul>\n<h2>Try it<\/h2>\n<p><a href=\"https:\/\/wiro.ai\/models\/openmoss\/moss-ttsd\">Run MOSS-TTSD<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>MOSS-TTSD turns a dialogue script into spoken conversation. This post runs 6 short tests and shares the raw audio outputs. The goal:&hellip;<\/p>\n","protected":false},"author":4,"featured_media":1639,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[52],"tags":[62],"class_list":["post-1638","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-model-reviews","tag-text-to-speech"],"_links":{"self":[{"href":"https:\/\/wiro.ai\/blog\/wp-json\/wp\/v2\/posts\/1638","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/wiro.ai\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/wiro.ai\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/wiro.ai\/blog\/wp-json\/wp\/v2\/users\/4"}],"replies":[{"embeddable":true,"href":"https:\/\/wiro.ai\/blog\/wp-json\/wp\/v2\/comments?post=1638"}],"version-history":[{"count":1,"href":"https:\/\/wiro.ai\/blog\/wp-json\/wp\/v2\/posts\/1638\/revisions"}],"predecessor-version":[{"id":1640,"href":"https:\/\/wiro.ai\/blog\/wp-json\/wp\/v2\/posts\/1638\/revisions\/1640"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/wiro.ai\/blog\/wp-json\/wp\/v2\/media\/1639"}],"wp:attachment":[{"href":"https:\/\/wiro.ai\/blog\/wp-json\/wp\/v2\/media?parent=1638"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/wiro.ai\/blog\/wp-json\/wp\/v2\/categories?post=1638"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/wiro.ai\/blog\/wp-json\/wp\/v2\/tags?post=1638"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}