Search
Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks.
The image to generate a caption for.
The level of detail in the generated caption.
The generated caption for the image.