Grok Imagine

Provided by Fal — Learn More

Grok Imagine Video generates video with synchronized audio from text prompts, single images, or multiple reference images using xAI's Grok model. In text-to-video mode, describe your scene to produce realistic footage up to 15 seconds. Provide a single image to animate it as the first frame. Supply two to seven reference images and cite them in your prompt as @Image1 through @Image7 to blend their visual features into a single coherent video. Outputs up to 720p resolution.

Preview

Inputs

Outputs

Gemini Omni Video

Gemini Omni Flash is a public-preview Google video model for 720p text-to-video, image-guided video, and video-to-video generation.

Grok Imagine Edit

Grok Imagine Video Edit & Extend applies xAI's Grok model to transform or continue existing video footage. In Edit mode, supply a video and a text description of the desired change — such as colorizing, restyling, or modifying scene elements — and receive an edited version at up to 720p resolution. In Extend mode, provide a video up to 15 seconds long and a prompt describing what should happen next to seamlessly append 2–10 seconds of new footage to the end of your clip.

Grok Imagine

Preview

Inputs

Outputs

On this page