Model Reviews

Kling V3 Omni: 3 Sound-On Text-to-Video Tests (720p)

Kling V3 Omni: 3 Sound-On Text-to-Video Tests (720p)

Kling V3 Omni is a text-to-video model that can generate motion and sound from a single first-frame image. In this post, I ran three quick 5-second tests at 720p with sound on, using the same settings each time.

Model

Kling V3 Omni on Wiro

Settings used

Mode std (720p)
Duration 5 seconds
Ratio 9:16
Sound on
CFG scale 0.5

Results (3 tests)

Test 1: Mountain hike

First-frame input image for Kling V3 Omni test: Mountain hike
First frame
Prompt: Wide tracking shot from behind. The hiker from @image carefully hikes down the rocky mountain trail at golden hour. Loose gravel shifts under each step. The camera smoothly follows, slight parallax in foreground rocks, distant town and ocean in soft haze. Audio: crisp footstep crunch on rocks, gentle mountain wind.

Test 2: Convertible drive

First-frame input image for Kling V3 Omni test: Convertible drive
First frame
Prompt: Mid-shot tracking profile. The silver vintage convertible from @image drives smoothly along the overpass from right to left. Wheels spin realistically, sunlight glints on chrome, subtle motion blur on the road. Audio: low vintage engine purr, tires rolling on asphalt.

Test 3: Dog in snow

First-frame input image for Kling V3 Omni test: Dog in snow
First frame
Prompt: Close-up portrait shot. The golden retriever from @image sits in the snow and slowly tilts its head, blinking once, breath visible in the cold air. Very slow push-in toward the face, shallow depth of field. Audio: soft winter wind, gentle panting and a faint collar jingle.

Notes

  • Audio behaves best when the prompt names 1-2 clear sources (engine, footsteps, wind).
  • Keep camera direction simple for 5-second clips (tracking, push-in, slow pan).
  • If faces drift, reduce motion and avoid fast turns.

Leave a Comment

Your email address will not be published. Required fields are marked *