Wan 2.6 works best when you prompt it like a film shot instead of a still-image caption. It responds especially well to clear scene setup, explicit motion, deliberate camera direction, timed action beats, and strong continuity instructions.
Subject + environment + action + camera + lighting + style + timing + constraints
Wan 2.6 is strong at cinematic short-form video, multi-shot structure, image-to-video continuity, and prompt-driven motion, so it tends to reward prompts that read like direction for a film crew.
Prompt the sequence like a shot, not like a poster.
Who + where + what happens + camera + light + style + timing + negatives
Define the main subject clearly with only the details that affect identity, wardrobe, mood, or framing.
Anchor the shot in a specific place with enough lighting, texture, and background detail to prevent scene drift.
Describe visible motion in a simple sequence with verbs and small timed beats instead of static description.
Specify framing, movement, and perspective so the shot feels intentional and cinematic.
For text-to-video, define the whole visual situation from scratch and choreograph the shot in a clear sequence.
For image-to-video, use the source image as the anchor and describe what should stay fixed, what should animate, and how the camera should behave.
Wan 2.6 usually responds best when you give it one clear framing choice and one deliberate camera move.
Wan 2.6 tends to perform better when the action unfolds in simple beats instead of everything happening at once.
Give Wan 2.6 a clear subject, a real setting, a visible action, a deliberate camera move, and simple timed beats.