ACE-Step Image To Song (v1.3-5B): 5 Visual Tests
Image-to-song sounds like a gimmick until you try it with clear visuals. This post runs five images through ACE-Step and listens for changes in pace, energy, and vibe.
Model link
Test setup
- Duration: 30 seconds
- Steps: 40
- Guidance scale: 15
- Scheduler: euler
- Guidance type: apg
- Fixed labels for all tests: genre=Pop, instrument=guitar, mood=happy, gender=female, timbre=bright vocal
Test 1: Neon rooftop party

This one tends to push a brighter, higher-energy feel. The dense lights and motion cues often map to a busier arrangement.
Test 2: Cozy cabin in snow

Warm light and a quiet scene can pull the output toward a calmer intro and softer transitions, even with the same labels.
Test 3: Underwater coral reef

This setup often lands in a smoother, floating groove. If you want more punch, the image needs stronger contrast and sharper motion cues.
Test 4: 1970s road trip frame

Film color and open landscapes can steer the output toward a more relaxed rhythm and wider, less crowded instrumentation.
Test 5: Stormy lighthouse

High contrast and drama can nudge the arrangement toward heavier hits and more tension, even when the genre label stays the same.
Quick takeaways
| Input image type | What to listen for | Prompt tip |
|---|---|---|
| High motion / neon | Denser rhythm and brighter tone | Add movement and strong lighting cues |
| Cozy interior / warm light | Softer transitions | Keep the scene simple and quiet |
| Wide landscapes | More space in the mix | Use open composition, fewer subjects |
| Stormy, high contrast | More tension and impact | Push contrast, weather, and drama |