Wan-2.2 is a leading-edge and highly capable image and video generation model developed by Tongyi Lab of Alibaba Group. Wan-2.2 achieves professional cinematic narratives through a deep command of shot language, offering fine-grained control over lighting, color, and composition for versatile styles with delicate detail. It effortlessly recreates all kinds of complex motion, with enhanced fluidity and control, with better understanding and execution of prompts for complex scenes and multi-object generation. Wan-2.2 can generalize across multiple dimensions such as motions, semantics, and aesthetics. In addition to text-to-video and image-to-video, Wan-2.2 also supports video-to-video generation, with the ability to perform a wide range of edits on an input video such as adding, removing, and transforming objects.
The image used as the first frame of the video. If the image does not match the chosen aspect ratio, it is resized and center cropped.
optional:true
End Image
image
The image used as the last frame of the video. Only supported by the Pro model.
optional:true
Video Prompt
video
Input video, used for video-to-video generation and editing.
optional:true
Negative Prompt
text
The negative prompt is used to guide the model to avoid generating videos that contain certain elements or concepts.
optional:true
Model
dropdown
The model to use for video generation.
default:wan-2.2-5bAccepts: Lite (5B), Pro (14B)
Frames
number
The total number of frames to generate. Must be between 81 and 121.
default:81•minimum:81•maximum:121
FPS
number
The number of frames per second to generate.
default:24•minimum:5•maximum:60
Seed
seed
The same seed and prompt will output the same video every time.
default:58004•minimum:0•maximum:65535
Resolution
dropdown
The resolution of the generated video.
default:720pAccepts: 480p, 580p, 720p
Aspect Ratio
dropdown
The aspect ratio of the generated video. If 'auto', the aspect ratio will be determined automatically based on the image or video prompts.
default:autoAccepts: Auto, 16:9, 9:16, 1:1
Steps
number
The number of inference steps to perform. Higher values can help with preserving details, but take longer to generate. We recommend using 40 steps for Lite and 27 steps for Pro.
default:27•minimum:2•maximum:50
Guidance Scale
number
Higher guidance scales can help with preserving garment detail, but risks oversaturated colors.
default:3.5•minimum:1•maximum:10
Shift
number
Shift controls how the model moves through the denoising process, affecting motion and time flow in your video. Lower values result in smoother, more predictable movement. Higher values result in more dynamic but sometimes chaotic motion.
default:5•minimum:1•maximum:10
Expand Prompt
toggle
Whether to expand the prompt using the model's own capabilities
default:true
Block NSFW
toggle
Whether to block NSFW content
default:false
Turbo Mode
toggle
When enabled, the video will be generated faster with no noticeable degradation in the visual quality. This property is not supported when using video-to-video generation.