Model Reviews

VoxCPM: Voice Cloning and TTS in 6 Tests

VoxCPM: Voice Cloning and TTS in 6 Tests

VoxCPM is a text-to-speech model that can also do zero-shot voice cloning from a short reference clip. This review runs 6 tests on Wiro and shares the raw MP3 outputs.

Model link: https://wiro.ai/models/openbmb/voxcpm

What VoxCPM takes as input

  • prompt: the text to speak
  • cfgValue: higher sticks closer to the text, but can sound worse
  • inferenceSteps: higher can improve quality, but takes longer
  • inputAudio + referencePrompt (optional): reference voice clip and its transcript for voice cloning

Test 1: Numbers, currency, tracking code

cfgValue=2.0, inferenceSteps=10

Prompt: Your order 51723 is confirmed. Total: 1249.90 TL. Delivery window: 2 to 3 business days. Tracking: TR-508-AB.

Takeaway: Short business text came out clear. Digits and decimals sounded stable.

Test 2: Calm narration

cfgValue=2.0, inferenceSteps=20

Prompt: The street is quiet after midnight. A tram passes and the sound fades into the rain. The cafe sign flickers once, then holds steady.

Takeaway: Longer sentences sounded smooth. The pacing did not collapse.

Test 3: Support message

cfgValue=2.0, inferenceSteps=10

Prompt: Hi. This is support. The reset link expires in 15 minutes. Do not share the code. If this was not requested, ignore this message.

Takeaway: Short sentence breaks helped the model keep a consistent tone.

Test 4: Fast ad read (speed stress)

cfgValue=2.3, inferenceSteps=5

Prompt: New drop. Same price. Faster shipping. Add to cart and check out in under 30 seconds.

Takeaway: Low steps ran fast, but the voice sounded more synthetic.

Test 5: Voice cloning from a clean reference clip

Reference input:

Reference transcript: For the shipping audit, order 48219 shipped on February 14 at 9:05 AM. Total weight 3.7 kilograms. Tracking code Z X dash 9 1 dash Delta.

Clone output (cfgValue=2.0, inferenceSteps=10):

Prompt: Voice clone test. Ticket 77104 closed at 18:30. Refund amount 79.90 TL. Please reply with the last four digits of the card.

Takeaway: The output followed the reference voice style better than the default voice tests.

Test 6: Voice cloning from a token-heavy reference clip

Reference input:

Reference transcript: Email support plus wiro at acme dot dev. URL https colon slash slash api dot example dot com slash v1 slash run question mark mode equals fast ampersand retry equals 2. Error code E underscore C O N N underscore R E S E T. Commit seven f three a nine c one.

Clone output (cfgValue=2.0, inferenceSteps=10):

Prompt: Second clone test. Open https colon slash slash status dot example dot com. If error code E underscore T I M E O U T appears, retry twice.

Takeaway: Token-like text stayed hard. Even with a matching reference style, URLs and spelled-out symbols need client-side rules.

What VoxCPM did well

  • Clean business narration with numbers and short sentences
  • Voice cloning worked when a reference clip and its transcript were provided

Where it struggled

  • Token-heavy text like URLs, underscores, and spelled-out symbols
  • Very low inferenceSteps traded quality for speed fast

Try it

VoxCPM on Wiro


Leave a Comment

Your email address will not be published. Required fields are marked *