Model Reviews

ACE-Step Image To Song (v1.3-5B): 5 Visual Tests

ACE-Step Image To Song (v1.3-5B): 5 Visual Tests

ACE-Step Image To Song (v1.3-5B): 5 Visual Tests

Image-to-song sounds like a gimmick until you try it with clear visuals. This post runs five images through ACE-Step and listens for changes in pace, energy, and vibe.

Model link

Test setup

  • Duration: 30 seconds
  • Steps: 40
  • Guidance scale: 15
  • Scheduler: euler
  • Guidance type: apg
  • Fixed labels for all tests: genre=Pop, instrument=guitar, mood=happy, gender=female, timbre=bright vocal

Test 1: Neon rooftop party

Neon rooftop party image used as input
Input image prompt: Neon-lit rooftop party in Tokyo at night, silhouettes dancing, purple and cyan lights, city skyline behind, cinematic, high detail.

This one tends to push a brighter, higher-energy feel. The dense lights and motion cues often map to a busier arrangement.

Test 2: Cozy cabin in snow

Snowy cabin image used as input
Input image prompt: Cozy wooden cabin in a snowy pine forest at dusk, warm window glow, chimney smoke, soft falling snow, photoreal, sharp.

Warm light and a quiet scene can pull the output toward a calmer intro and softer transitions, even with the same labels.

Test 3: Underwater coral reef

Underwater coral reef image used as input
Input image prompt: Underwater coral reef scene with sunbeams, colorful fish, clear water, wide angle, photoreal, calm mood.

This setup often lands in a smoother, floating groove. If you want more punch, the image needs stronger contrast and sharper motion cues.

Test 4: 1970s road trip frame

Vintage road trip image used as input
Input image prompt: Vintage 1970s road trip photo, convertible on an empty desert highway, warm film color, dust in the air, sun flare, candid style.

Film color and open landscapes can steer the output toward a more relaxed rhythm and wider, less crowded instrumentation.

Test 5: Stormy lighthouse

Stormy lighthouse image used as input
Input image prompt: Stormy ocean at night with a tall lighthouse on rocky cliffs, huge waves, rain, dramatic lighting, cinematic long exposure feel.

High contrast and drama can nudge the arrangement toward heavier hits and more tension, even when the genre label stays the same.

Quick takeaways

Input image type What to listen for Prompt tip
High motion / neon Denser rhythm and brighter tone Add movement and strong lighting cues
Cozy interior / warm light Softer transitions Keep the scene simple and quiet
Wide landscapes More space in the mix Use open composition, fewer subjects
Stormy, high contrast More tension and impact Push contrast, weather, and drama

Try it


Leave a Comment

Your email address will not be published. Required fields are marked *