{"id":1585,"date":"2026-03-21T18:29:39","date_gmt":"2026-03-21T18:29:39","guid":{"rendered":"https:\/\/wiro.ai\/blog\/?p=1585"},"modified":"2026-03-02T19:01:19","modified_gmt":"2026-03-02T19:01:19","slug":"live-avatar-audio-driven-talking-head-videos-in-6-tests","status":"publish","type":"post","link":"https:\/\/wiro.ai\/blog\/live-avatar-audio-driven-talking-head-videos-in-6-tests\/","title":{"rendered":"Live Avatar: Audio-Driven Talking Head Videos in 6 Tests"},"content":{"rendered":"<h2>Live Avatar: Audio-Driven Talking Head Videos in 6 Tests<\/h2>\n<p>Live Avatar generates a talking head video from a still image and an audio clip. The input image sets identity and framing. The audio drives mouth movement. The tests below focus on lip sync, identity stability, and prompt steering.<\/p>\n<h2>Model link<\/h2>\n<ul>\n<li><a href=\"https:\/\/wiro.ai\/models\/alibaba-quark\/live-avatar\">https:\/\/wiro.ai\/models\/alibaba-quark\/live-avatar<\/a><\/li>\n<\/ul>\n<h2>Inputs used<\/h2>\n<p>Two short WAV clips and three images were used across all tests.<\/p>\n<ul>\n<li>Audio A (WAV): https:\/\/wiro.ai\/blog\/wp-content\/uploads\/2026\/03\/live-avatar-audio-01.wav<\/li>\n<li>Audio B (WAV): https:\/\/wiro.ai\/blog\/wp-content\/uploads\/2026\/03\/live-avatar-audio-02.wav<\/li>\n<li>Image 1 (dwarf blacksmith): https:\/\/wiro.ai\/blog\/wp-content\/uploads\/2026\/03\/live-avatar-input-01.jpg<\/li>\n<li>Image 2 (fashion blogger): https:\/\/wiro.ai\/blog\/wp-content\/uploads\/2026\/03\/live-avatar-input-02.jpg<\/li>\n<li>Image 3 (cat on surfboard): https:\/\/wiro.ai\/blog\/wp-content\/uploads\/2026\/03\/live-avatar-input-03.jpg<\/li>\n<\/ul>\n<h2>How the model was run<\/h2>\n<ul>\n<li>inputImageUrl: a face or character image<\/li>\n<li>inputAudioUrl: a WAV file URL<\/li>\n<li>prompt: style and scene guidance<\/li>\n<li>seed: used to vary motion and details<\/li>\n<\/ul>\n<h2>Test 1: Cinematic dwarf blacksmith (Audio A)<\/h2>\n<figure>\n  <img decoding=\"async\" src=\"https:\/\/wiro.ai\/blog\/wp-content\/uploads\/2026\/03\/live-avatar-input-01.jpg\" alt=\"Input image of a dwarven blacksmith character for Live Avatar test 1\"\/><figcaption>Input image. Audio: A.<\/figcaption><\/figure>\n<figure>\n  <video controls preload=\"metadata\" poster=\"https:\/\/wiro.ai\/blog\/wp-content\/uploads\/2026\/03\/live-avatar-01-poster.jpg\"><source src=\"https:\/\/wiro.ai\/blog\/wp-content\/uploads\/2026\/03\/live-avatar-01.mp4\" type=\"video\/mp4\"\/><\/video><figcaption>Prompt: A cheerful dwarf blacksmith in a fiery forge, explaining craft while holding a glowing hammer. Cinematic warm lighting, detailed face, natural mouth movement.<\/figcaption><\/figure>\n<p>The face stays stable and the mouth motion follows the audio. The cinematic prompt pushes lighting and mood without breaking identity.<\/p>\n<h2>Test 2: Documentary style interview (Audio A)<\/h2>\n<figure>\n  <img decoding=\"async\" src=\"https:\/\/wiro.ai\/blog\/wp-content\/uploads\/2026\/03\/live-avatar-input-01.jpg\" alt=\"Input image of a dwarven blacksmith character for Live Avatar test 2\"\/><figcaption>Input image. Audio: A.<\/figcaption><\/figure>\n<figure>\n  <video controls preload=\"metadata\" poster=\"https:\/\/wiro.ai\/blog\/wp-content\/uploads\/2026\/03\/live-avatar-02-poster.jpg\"><source src=\"https:\/\/wiro.ai\/blog\/wp-content\/uploads\/2026\/03\/live-avatar-02.mp4\" type=\"video\/mp4\"\/><\/video><figcaption>Prompt: A documentary style talking head interview of a dwarf blacksmith in a workshop. Soft key light, realistic skin texture, stable identity, clean lip sync.<\/figcaption><\/figure>\n<p>This prompt aims for a flatter, interview look. It helps spot background flicker and identity drift.<\/p>\n<h2>Test 3: Fashion blogger presenter (Audio B)<\/h2>\n<figure>\n  <img decoding=\"async\" src=\"https:\/\/wiro.ai\/blog\/wp-content\/uploads\/2026\/03\/live-avatar-input-02.jpg\" alt=\"Input image of a fashion blogger in a white suit for Live Avatar test 3\"\/><figcaption>Input image. Audio: B.<\/figcaption><\/figure>\n<figure>\n  <video controls preload=\"metadata\" poster=\"https:\/\/wiro.ai\/blog\/wp-content\/uploads\/2026\/03\/live-avatar-03-poster.jpg\"><source src=\"https:\/\/wiro.ai\/blog\/wp-content\/uploads\/2026\/03\/live-avatar-03.mp4\" type=\"video\/mp4\"\/><\/video><figcaption>Prompt: A fashion blogger presenting to camera in a white suit, studio lighting, clean background, natural blinking, subtle head motion, crisp details.<\/figcaption><\/figure>\n<p>The model handles a real photo style input with cleaner skin texture. Small head motion looks natural when the prompt asks for subtlety.<\/p>\n<h2>Test 4: Cat on surfboard talking (Audio A)<\/h2>\n<figure>\n  <img decoding=\"async\" src=\"https:\/\/wiro.ai\/blog\/wp-content\/uploads\/2026\/03\/live-avatar-input-03.jpg\" alt=\"Input image of a white cat wearing sunglasses on a surfboard for Live Avatar test 4\"\/><figcaption>Input image. Audio: A.<\/figcaption><\/figure>\n<figure>\n  <video controls preload=\"metadata\" poster=\"https:\/\/wiro.ai\/blog\/wp-content\/uploads\/2026\/03\/live-avatar-04-poster.jpg\"><source src=\"https:\/\/wiro.ai\/blog\/wp-content\/uploads\/2026\/03\/live-avatar-04.mp4\" type=\"video\/mp4\"\/><\/video><figcaption>Prompt: A white cat wearing sunglasses on a surfboard at the beach, close up to camera, playful expression. Keep the cat identity stable and match mouth motion to audio.<\/figcaption><\/figure>\n<p>This test pushes the model outside typical human faces. The key check is whether the subject stays recognizable while the mouth moves.<\/p>\n<h2>Test 5: Same identity, different audio (Audio B)<\/h2>\n<figure>\n  <img decoding=\"async\" src=\"https:\/\/wiro.ai\/blog\/wp-content\/uploads\/2026\/03\/live-avatar-input-01.jpg\" alt=\"Input image of a dwarven blacksmith character for Live Avatar test 5\"\/><figcaption>Input image. Audio: B.<\/figcaption><\/figure>\n<figure>\n  <video controls preload=\"metadata\" poster=\"https:\/\/wiro.ai\/blog\/wp-content\/uploads\/2026\/03\/live-avatar-05-poster.jpg\"><source src=\"https:\/\/wiro.ai\/blog\/wp-content\/uploads\/2026\/03\/live-avatar-05.mp4\" type=\"video\/mp4\"\/><\/video><figcaption>Prompt: A dwarf blacksmith talking to camera in a forge. Stable face, consistent color, accurate lip sync. Keep the same framing as the input image.<\/figcaption><\/figure>\n<p>Swapping the audio tests whether mouth shapes adapt cleanly without changing identity.<\/p>\n<h2>Test 6: Neutral presenter prompt (Audio A)<\/h2>\n<figure>\n  <img decoding=\"async\" src=\"https:\/\/wiro.ai\/blog\/wp-content\/uploads\/2026\/03\/live-avatar-input-02.jpg\" alt=\"Input image of a fashion blogger for Live Avatar test 6\"\/><figcaption>Input image. Audio: A.<\/figcaption><\/figure>\n<figure>\n  <video controls preload=\"metadata\" poster=\"https:\/\/wiro.ai\/blog\/wp-content\/uploads\/2026\/03\/live-avatar-06-poster.jpg\"><source src=\"https:\/\/wiro.ai\/blog\/wp-content\/uploads\/2026\/03\/live-avatar-06.mp4\" type=\"video\/mp4\"\/><\/video><figcaption>Prompt: A person speaking to camera. Neutral studio background. Stable face and clean lip sync.<\/figcaption><\/figure>\n<p>This test keeps the prompt minimal. It helps show baseline lip sync and stability without style pressure.<\/p>\n<h2>What worked well<\/h2>\n<ul>\n<li>Lip sync stays coherent across different voices and pacing.<\/li>\n<li>Identity usually stays stable when prompts describe lighting and framing, not new facial features.<\/li>\n<li>Small head motion looks better than large camera moves for talking heads.<\/li>\n<\/ul>\n<h2>What to watch for<\/h2>\n<ul>\n<li>Some outputs can shift the scene style more than expected if prompts push hard aesthetics.<\/li>\n<li>Non-human subjects can look fun, but mouth motion may look less natural.<\/li>\n<\/ul>\n<h2>Try it<\/h2>\n<ul>\n<li><a href=\"https:\/\/wiro.ai\/models\/alibaba-quark\/live-avatar\">Run Live Avatar on Wiro<\/a><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Live Avatar: Audio-Driven Talking Head Videos in 6 Tests Live Avatar generates a talking head video from a still image and an&hellip;<\/p>\n","protected":false},"author":4,"featured_media":1584,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[52],"tags":[139,137,59,138],"class_list":["post-1585","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-model-reviews","tag-alibaba","tag-live-avatar","tag-speech-to-video","tag-talking-head"],"_links":{"self":[{"href":"https:\/\/wiro.ai\/blog\/wp-json\/wp\/v2\/posts\/1585","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/wiro.ai\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/wiro.ai\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/wiro.ai\/blog\/wp-json\/wp\/v2\/users\/4"}],"replies":[{"embeddable":true,"href":"https:\/\/wiro.ai\/blog\/wp-json\/wp\/v2\/comments?post=1585"}],"version-history":[{"count":1,"href":"https:\/\/wiro.ai\/blog\/wp-json\/wp\/v2\/posts\/1585\/revisions"}],"predecessor-version":[{"id":1586,"href":"https:\/\/wiro.ai\/blog\/wp-json\/wp\/v2\/posts\/1585\/revisions\/1586"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/wiro.ai\/blog\/wp-json\/wp\/v2\/media\/1584"}],"wp:attachment":[{"href":"https:\/\/wiro.ai\/blog\/wp-json\/wp\/v2\/media?parent=1585"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/wiro.ai\/blog\/wp-json\/wp\/v2\/categories?post=1585"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/wiro.ai\/blog\/wp-json\/wp\/v2\/tags?post=1585"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}