Model Reviews

Chatterbox Multilingual: Voice Cloning in 23 Languages

Chatterbox Multilingual: Voice Cloning in 23 Languages

Chatterbox Multilingual is most interesting when the same reference voice has to survive language transfer, pacing changes, and expressive settings without falling apart.

Chatterbox Multilingual: what it does

Chatterbox Multilingual generates speech in 23 languages and can clone a voice from a short reference clip. It also exposes two knobs that matter in practice: exaggeration for expressiveness, and cfg_weight for guidance and pacing. For cross-language transfer, setting cfg_weight to 0 can help reduce the reference accent bleeding into the target language.

Model

Test rules

  • 6 short runs
  • One reference voice clip reused across all tests (voice cloning)
  • English tests use cfg_weight between 0.3 and 0.5
  • Non-English tests use cfg_weight=0 (language transfer setting)
  • Outputs published as-is

Hero image

Cover image for the Chatterbox Multilingual post
Prompt: Modern blog cover based on the input image as background. Create a clean stylized audio waveform scene with soft bokeh, dark gradient overlay for contrast. Add three lines of left aligned title text centered vertically. Line 1 huge bold serif Chatterbox. Line 2 small thin italic serif Multilingual. Line 3 medium bold serif 23 Languages. White text with drop shadow. No logos. No extra text.

Results (6 tests)

Test 1: English support script (neutral)

Text: Thanks for calling support. Please confirm the last four digits of the order number. Then say the delivery city.

Settings: language=en, exaggeration=0.5, cfg_weight=0.5, temperature=0.8

Waveform preview for Test 1
Prompt: Thanks for calling support. Please confirm the last four digits of the order number. Then say the delivery city.

Test 2: English podcast intro pacing

Text: Welcome back to the show. Today: why latency makes voice apps feel broken. Three quick points: timing, pauses, and turn taking.

Settings: language=en, exaggeration=0.5, cfg_weight=0.4, temperature=0.8

Waveform preview for Test 2
Prompt: Welcome back to the show. Today: why latency makes voice apps feel broken. Three quick points: timing, pauses, and turn taking.

Test 3: English emotional shift (higher exaggeration)

Text: I should have called sooner. The silence made things worse. Please pause. Then say this line slowly: I am sorry.

Settings: language=en, exaggeration=1.2, cfg_weight=0.3, temperature=0.9

Waveform preview for Test 3
Prompt: I should have called sooner. The silence made things worse. Please pause. Then say this line slowly: I am sorry.

Test 4: Turkish localization line (cfg_weight=0)

Text: Merhaba. Bu test ayni sesi Turkce konusturuyor. Lutfen siparis numarasinin son dort hanesini soyle. Sonra teslimat sehrini soyle.

Settings: language=tr, exaggeration=0.6, cfg_weight=0, temperature=0.8

Waveform preview for Test 4
Prompt: Merhaba. Bu test ayni sesi Turkce konusturuyor. Lutfen siparis numarasinin son dort hanesini soyle. Sonra teslimat sehrini soyle.

Test 5: Japanese support line (cfg_weight=0)

Text: こんにちは. このテストは同じ声で日本語を話します. 注文番号の下4けたを言ってください. それから配達先の都市を言ってください.

Settings: language=ja, exaggeration=0.55, cfg_weight=0, temperature=0.8

Waveform preview for Test 5
Prompt: こんにちは. このテストは同じ声で日本語を話します. 注文番号の下4けたを言ってください. それから配達先の都市を言ってください.

Test 6: Spanish support line (cfg_weight=0)

Text: Hola. Este test usa la misma voz en espanol. Di los ultimos cuatro digitos del pedido. Luego di la ciudad de entrega.

Settings: language=es, exaggeration=0.55, cfg_weight=0, temperature=0.8

Waveform preview for Test 6
Prompt: Hola. Este test usa la misma voz en espanol. Di los ultimos cuatro digitos del pedido. Luego di la ciudad de entrega.

Speed snapshot (task elapsed time)

Test Elapsed (s)
1 44
2 10
3 7
4 10
5 9
6 10

Takeaways

  • cfg_weight and exaggeration change the feel quickly. Small adjustments matter.
  • For cross-language voice transfer, cfg_weight=0 offers a clean baseline to judge accent carryover.
  • Short scripts make it easier to spot pronunciation issues, pacing drift, and number reading.

Try it

Run Chatterbox Multilingual on Wiro


Leave a Comment

Your email address will not be published. Required fields are marked *