Wan-2.2 Video
Provided by Fal — Learn More
Wan-2.2 is a leading-edge and highly capable image and video generation model developed by Tongyi Lab of Alibaba Group. Wan-2.2 achieves professional cinematic narratives through a deep command of shot language, offering fine-grained control over lighting, color, and composition for versatile styles with delicate detail. It effortlessly recreates all kinds of complex motion, with enhanced fluidity and control, with better understanding and execution of prompts for complex scenes and multi-object generation. Wan-2.2 can generalize across multiple dimensions such as motions, semantics, and aesthetics. In addition to text-to-video and image-to-video, Wan-2.2 also supports video-to-video generation, with the ability to perform a wide range of edits on an input video such as adding, removing, and transforming objects.
Wan-2.1
Wan-2.1 is an advanced and powerful visual generation model developed by Tongyi Lab. It can generate videos based on text, images and other control signals. Wan-2.1 excels at generating realistic videos featuring extensive body movements, complex rotations, dynamic scene transitions, and fluid camera motions, and can accurately simulate real-world physics and realistic object interactions, while offering movie-like visuals with rich textures and a variety of stylized effects. It can also create text and dynamic text effects in videos directly from text prompts.
Wan-2.6 Video
Wan 2.6 is an advanced AI video generation model featuring smart shot scheduling for multi-shot storytelling, higher-quality voice generation with stable multi-speaker dialogue, and natural, realistic voices. It accurately generates multi-shot sequences to express a full story while keeping key details consistent between shots, and can auto-plan scenes from simple prompts. Wan 2.6 supports video generation up to 15 seconds, creating 1080p videos at 24fps with native audio-visual synchronization, ensuring dialogue, music, and sound effects align perfectly with character movements and lip-sync. It includes video reference generation that uses a reference video for look and voice, then follows your prompt to create new clips. Supports text-to-video, image-to-video, and video-to-video generation.