Wan 2.5 Video Generation

Wan 2.5 Prompting Guide

Wan 2.5 is best prompted like a short film shot, not a still-image caption. Because many implementations support audio, prompts should often describe not just visuals and motion, but also timing, performance, camera behavior, and sound.

Best Overall Formula

Subject + Environment + Action + Camera + Lighting + Style + Audio + Constraints

Why Wan 2.5 Feels Different

Wan 2.5 often works best when prompts read like film direction notes because the model may support lip sync, native audio, uploaded audio, and stronger prompt awareness.

Single Biggest Rule

Keep one shot idea per clip.

Best Order to Write

Who + where + what happens + camera + light + style + sound + negatives

Prompt Anatomy

1) Subject

Describe the subject in a way that affects generation: wardrobe, age range, posture, hair, expression, emotional state.

2) Environment

Be specific enough to anchor the shot: indoors or outdoors, time of day, weather, props, cleanliness, crowd level.

3) Action

Use visible motion. Prefer concrete actions over abstract feelings.

4) Camera

Specify framing, angle, and movement instead of letting the model improvise.

Useful Lighting Language

soft morning window light
harsh fluorescent office lighting
warm tungsten bulbs
moody moonlight
neon edge lighting
overcast daylight
golden-hour backlight
candlelit darkness

Useful Style Lanes

photoreal cinematic
documentary realism
glossy fashion ad
gritty handheld street footage
dreamlike fantasy
anime action
vintage 1970s film
music video aesthetic
luxury editorial

Audio Matters in Wan 2.5

Because many Wan 2.5 implementations support audio, sound should be prompted intentionally when relevant.

Useful audio instruction types:

ambient only
no dialogue
whispered dialogue
clear voiceover
distant traffic
thunder rumble
soft applause
birds and wind
nightclub bass
footsteps on concrete
crackling fire
radio static

Text-to-Video

For T2V, your prompt must do more worldbuilding.

A lone astronaut walks slowly through a dim abandoned spacecraft corridor, illuminated by flickering emergency lights and drifting sparks. The camera tracks backward in front of him in a smooth slow dolly shot. Dust floats in zero gravity. Cinematic sci-fi realism, metallic reflections, tense silence, distant alarm beeps, no other characters.

Image-to-Video

For I2V, focus more on what moves, what stays fixed, camera behavior, atmosphere, audio, and preservation.

She remains centered and retains the same facial features, hair, outfit, and composition as the reference image. She blinks, smiles softly, and slowly turns toward the window. The curtains move slightly in a light breeze. Gentle camera push-in, warm afternoon sunlight, quiet room tone, no extra people, no text overlay, no major pose change.

Motion Guidance

Low Motion

subtle breathing
blinking
slight smile
hair moving in breeze
gentle head turn

Medium Motion

walking slowly
turning and looking back
opening a door
sitting down
raising a hand

High Motion

sprinting
fighting
crowd chaos
explosions
rapid camera shake

Camera Movement

One camera move is usually enough.

static
slow push-in
slow pull-back
gentle orbit
smooth lateral tracking
handheld follow
overhead descent

Dialogue and Lip Sync

A woman faces camera and says, “I knew you’d come back,” with calm, restrained emotion. Her lip movements stay precise and natural, with subtle blinking and a slight breath before the line. Quiet hallway ambience, no music.

Useful Negatives

no extra people
no text overlay
no watermark
no abrupt cuts
no shaky camera
no deformed hands
no duplicated limbs
no flickering face
no background morphing

Strong Master Template

[Subject] in [environment].
[Action sequence in 1–3 visible beats].
[Shot size / angle / camera move].
[Lighting + atmosphere].
[Visual style].
[Audio behavior: voice / ambience / music / silence].
[Constraints / negatives].

Bottom Line

Clear subject + clear location + one visible action + one camera move + one lighting concept + one style lane + intentional audio + explicit constraints.