Wan 2.2 works best when you treat the prompt like a shot direction, not a still-image caption. Think in terms of subject, environment, motion, camera, and visual style.
Subject + environment + action + camera + lighting + style + constraints
Wan 2.2 is a video model, so it needs motion language and time-based instructions instead of a pile of visual adjectives.
Do not prompt it like a still image.
Who + where + what moves + camera + look + negatives
Establish the main subject clearly so the model knows what matters most in the shot.
Anchor the subject in a specific place with enough detail to keep the scene coherent.
Describe visible motion using verbs and simple time-based beats.
Specify framing, angle, and movement instead of leaving the shot logic vague.
For text-to-video, front-load the worldbuilding because there is no source frame to anchor the shot.
For image-to-video, focus more on what in the image starts moving, what remains stable, and what the camera should do.
Wan 2.2 usually behaves best when you keep the camera instructions simple and readable.
What do we see, what moves, and how does the camera watch it?