Concepts

Transformer

Neural network architecture underlying GPT, DALL-E, and the text encoders in image models.

Related terms

When AI generates plausible but incorrect content (extra fingers, fake text).

Training a base model on custom data so it learns your style or character.

A word/subword unit the model processes. Most models cap at 75-225 tokens per prompt.

Compressed representation where diffusion happens. Models work here, then decode to pixels.

Process of starting from noise and iteratively denoising to form a coherent image.

A model that handles multiple input/output types (text, image, audio, video).