Grok Imagine Video generates video with synchronized audio from text prompts, single images, or multiple reference images using xAI's Grok model. In text-to-video mode, describe your scene to produce realistic footage up to 15 seconds. Provide a single image to animate it as the first frame. Supply two to seven reference images and cite them in your prompt as @Image1 through @Image7 to blend their visual features into a single coherent video. Outputs up to 720p resolution.