Kling V3 Omni is a text-to-video model that can generate motion and sound from a single first-frame image. In this post, I ran three quick 5-second tests at 720p with sound on, using the same settings each time.
First framePrompt: Wide tracking shot from behind. The hiker from @image carefully hikes down the rocky mountain trail at golden hour. Loose gravel shifts under each step. The camera smoothly follows, slight parallax in foreground rocks, distant town and ocean in soft haze. Audio: crisp footstep crunch on rocks, gentle mountain wind.
Test 2: Convertible drive
First framePrompt: Mid-shot tracking profile. The silver vintage convertible from @image drives smoothly along the overpass from right to left. Wheels spin realistically, sunlight glints on chrome, subtle motion blur on the road. Audio: low vintage engine purr, tires rolling on asphalt.
Test 3: Dog in snow
First framePrompt: Close-up portrait shot. The golden retriever from @image sits in the snow and slowly tilts its head, blinking once, breath visible in the cold air. Very slow push-in toward the face, shallow depth of field. Audio: soft winter wind, gentle panting and a faint collar jingle.
Notes
Audio behaves best when the prompt names 1-2 clear sources (engine, footsteps, wind).
Keep camera direction simple for 5-second clips (tracking, push-in, slow pan).
If faces drift, reduce motion and avoid fast turns.