Concepts

Multi-modal

A model that handles multiple input/output types (text, image, audio, video).