Chatterbox Multilingual is most interesting when the same reference voice has to survive language transfer, pacing changes, and expressive settings without falling apart.
Chatterbox Multilingual: what it does
Chatterbox Multilingual generates speech in 23 languages and can clone a voice from a short reference clip. It also exposes two knobs that matter in practice: exaggeration for expressiveness, and cfg_weight for guidance and pacing. For cross-language transfer, setting cfg_weight to 0 can help reduce the reference accent bleeding into the target language.
Model
Test rules
- 6 short runs
- One reference voice clip reused across all tests (voice cloning)
- English tests use cfg_weight between 0.3 and 0.5
- Non-English tests use cfg_weight=0 (language transfer setting)
- Outputs published as-is
Hero image

Results (6 tests)
Test 1: English support script (neutral)
Text: Thanks for calling support. Please confirm the last four digits of the order number. Then say the delivery city.
Settings: language=en, exaggeration=0.5, cfg_weight=0.5, temperature=0.8

Test 2: English podcast intro pacing
Text: Welcome back to the show. Today: why latency makes voice apps feel broken. Three quick points: timing, pauses, and turn taking.
Settings: language=en, exaggeration=0.5, cfg_weight=0.4, temperature=0.8

Test 3: English emotional shift (higher exaggeration)
Text: I should have called sooner. The silence made things worse. Please pause. Then say this line slowly: I am sorry.
Settings: language=en, exaggeration=1.2, cfg_weight=0.3, temperature=0.9

Test 4: Turkish localization line (cfg_weight=0)
Text: Merhaba. Bu test ayni sesi Turkce konusturuyor. Lutfen siparis numarasinin son dort hanesini soyle. Sonra teslimat sehrini soyle.
Settings: language=tr, exaggeration=0.6, cfg_weight=0, temperature=0.8

Test 5: Japanese support line (cfg_weight=0)
Text: こんにちは. このテストは同じ声で日本語を話します. 注文番号の下4けたを言ってください. それから配達先の都市を言ってください.
Settings: language=ja, exaggeration=0.55, cfg_weight=0, temperature=0.8

Test 6: Spanish support line (cfg_weight=0)
Text: Hola. Este test usa la misma voz en espanol. Di los ultimos cuatro digitos del pedido. Luego di la ciudad de entrega.
Settings: language=es, exaggeration=0.55, cfg_weight=0, temperature=0.8

Speed snapshot (task elapsed time)
| Test | Elapsed (s) |
|---|---|
| 1 | 44 |
| 2 | 10 |
| 3 | 7 |
| 4 | 10 |
| 5 | 9 |
| 6 | 10 |
Takeaways
- cfg_weight and exaggeration change the feel quickly. Small adjustments matter.
- For cross-language voice transfer, cfg_weight=0 offers a clean baseline to judge accent carryover.
- Short scripts make it easier to spot pronunciation issues, pacing drift, and number reading.