Wan-2.6 Video
Provided by Fal — Learn More
Wan 2.6 is an advanced AI video generation model featuring smart shot scheduling for multi-shot storytelling, higher-quality voice generation with stable multi-speaker dialogue, and natural, realistic voices. It accurately generates multi-shot sequences to express a full story while keeping key details consistent between shots, and can auto-plan scenes from simple prompts. Wan 2.6 supports video generation up to 15 seconds, creating 1080p videos at 24fps with native audio-visual synchronization, ensuring dialogue, music, and sound effects align perfectly with character movements and lip-sync. It includes video reference generation that uses a reference video for look and voice, then follows your prompt to create new clips. Supports text-to-video, image-to-video, and video-to-video generation.
Wan-2.2 Video
Wan-2.2 is a leading-edge and highly capable image and video generation model developed by Tongyi Lab of Alibaba Group. Wan-2.2 achieves professional cinematic narratives through a deep command of shot language, offering fine-grained control over lighting, color, and composition for versatile styles with delicate detail. It effortlessly recreates all kinds of complex motion, with enhanced fluidity and control, with better understanding and execution of prompts for complex scenes and multi-object generation. Wan-2.2 can generalize across multiple dimensions such as motions, semantics, and aesthetics. In addition to text-to-video and image-to-video, Wan-2.2 also supports video-to-video generation, with the ability to perform a wide range of edits on an input video such as adding, removing, and transforming objects.
F5 TTS
Text-to-speech synthesis tool that leverages AI to generate natural and expressive speech from text input.