VoxCPM: Voice Cloning and TTS in 6 Tests - Wiro AI

VoxCPM is a text-to-speech model that can also do zero-shot voice cloning from a short reference clip. This review runs 6 tests on Wiro and shares the raw MP3 outputs.

Model link: https://wiro.ai/models/openbmb/voxcpm

What VoxCPM takes as input

prompt: the text to speak
cfgValue: higher sticks closer to the text, but can sound worse
inferenceSteps: higher can improve quality, but takes longer
inputAudio + referencePrompt (optional): reference voice clip and its transcript for voice cloning

Test 1: Numbers, currency, tracking code

cfgValue=2.0, inferenceSteps=10

Prompt: Your order 51723 is confirmed. Total: 1249.90 TL. Delivery window: 2 to 3 business days. Tracking: TR-508-AB.

Takeaway: Short business text came out clear. Digits and decimals sounded stable.

Test 2: Calm narration

cfgValue=2.0, inferenceSteps=20

Prompt: The street is quiet after midnight. A tram passes and the sound fades into the rain. The cafe sign flickers once, then holds steady.

Takeaway: Longer sentences sounded smooth. The pacing did not collapse.

Test 3: Support message

cfgValue=2.0, inferenceSteps=10

Prompt: Hi. This is support. The reset link expires in 15 minutes. Do not share the code. If this was not requested, ignore this message.

Takeaway: Short sentence breaks helped the model keep a consistent tone.

Test 4: Fast ad read (speed stress)

cfgValue=2.3, inferenceSteps=5

Prompt: New drop. Same price. Faster shipping. Add to cart and check out in under 30 seconds.

Takeaway: Low steps ran fast, but the voice sounded more synthetic.

Test 5: Voice cloning from a clean reference clip

Reference input:

Reference transcript: For the shipping audit, order 48219 shipped on February 14 at 9:05 AM. Total weight 3.7 kilograms. Tracking code Z X dash 9 1 dash Delta.

Clone output (cfgValue=2.0, inferenceSteps=10):

Prompt: Voice clone test. Ticket 77104 closed at 18:30. Refund amount 79.90 TL. Please reply with the last four digits of the card.

Takeaway: The output followed the reference voice style better than the default voice tests.

Test 6: Voice cloning from a token-heavy reference clip

Reference input:

Reference transcript: Email support plus wiro at acme dot dev. URL https colon slash slash api dot example dot com slash v1 slash run question mark mode equals fast ampersand retry equals 2. Error code E underscore C O N N underscore R E S E T. Commit seven f three a nine c one.

Clone output (cfgValue=2.0, inferenceSteps=10):

Prompt: Second clone test. Open https colon slash slash status dot example dot com. If error code E underscore T I M E O U T appears, retry twice.

Takeaway: Token-like text stayed hard. Even with a matching reference style, URLs and spelled-out symbols need client-side rules.

What VoxCPM did well

Clean business narration with numbers and short sentences
Voice cloning worked when a reference clip and its transcript were provided

Where it struggled

Token-heavy text like URLs, underscores, and spelled-out symbols
Very low inferenceSteps traded quality for speed fast

Try it

VoxCPM on Wiro

What VoxCPM takes as input

Test 1: Numbers, currency, tracking code

Test 2: Calm narration

Test 3: Support message

Test 4: Fast ad read (speed stress)

Test 5: Voice cloning from a clean reference clip

Test 6: Voice cloning from a token-heavy reference clip

What VoxCPM did well

Where it struggled

Try it

Leave a Comment Cancel reply

Related Posts

Kling V3 Omni: 3 Sound-On Text-to-Video Tests (720p)

Easy OCR: 5 Layout Tests

LTX-2.3: 5 Text-to-Video Tests at 1080p

Stay in the Loop