Concepts
Transformer
Neural network architecture underlying GPT, DALL-E, and the text encoders in image models.
Related terms
Hallucination
When AI generates plausible but incorrect content (extra fingers, fake text).
Fine-Tuning
Training a base model on custom data so it learns your style or character.
Token
A word/subword unit the model processes. Most models cap at 75-225 tokens per prompt.
Latent Space
Compressed representation where diffusion happens. Models work here, then decode to pixels.
Diffusion
Process of starting from noise and iteratively denoising to form a coherent image.
Multi-modal
A model that handles multiple input/output types (text, image, audio, video).