AI Prompt Glossary

50 essential terms for image, video & prompt AI

Platforms

Midjourney

AI image generator known for artistic, painterly outputs. Now on v6/v7 with high prompt fidelity.

DALL-E 3

OpenAI's third-generation text-to-image model with strong text rendering and ChatGPT integration.

Stable Diffusion

Open-source text-to-image model by Stability AI. Highly customizable via LoRAs and ControlNets.

SDXL

Stable Diffusion XL — larger, higher-quality variant producing 1024×1024 images natively.

Flux

High-quality image model family from Black Forest Labs. Variants: Schnell, Dev, Pro.

Sora

OpenAI's text-to-video model producing minute-long high-fidelity video clips.

Runway

Video AI platform — Gen-2/Gen-3 models for text-to-video, image-to-video, motion brush.

Techniques

Negative Prompt

Tags telling the model what NOT to include. Crucial in Stable Diffusion to fix hands, blur, watermarks.

Prompt Weighting

Adjusting emphasis on specific tokens, e.g. (red dress:1.5) in Stable Diffusion.

CFG Scale

Classifier-Free Guidance — controls how strictly the model follows your prompt. Higher = stricter.

Seed

Random number that determines the noise pattern. Same prompt + same seed = reproducible image.

Inpainting

Masking a region of an image and regenerating just that area while keeping the rest.

Outpainting

Extending an image beyond its original borders, generating new content that matches the existing scene.

img2img

Using an input image + text prompt to guide generation, preserving composition.

Components

LoRA

Low-Rank Adaptation — a small trained file that adds a specific style or character to a base model.

ControlNet

Adds structural control: pose, depth, edges, scribbles. Essential for consistent character work.

VAE

Variational AutoEncoder — converts model latents to pixels. Different VAEs = different color/contrast.

Sampler

Algorithm that iteratively denoises images. Common: DPM++, Euler a, DDIM.

Checkpoint

A saved model file (.safetensors / .ckpt). Different checkpoints produce different aesthetics.

Embedding

Textual Inversion files that teach the model new concepts via short trigger words.

Parameters

Aspect Ratio

Image width-to-height. Midjourney: --ar 16:9. SD: 1024×768.

Steps

How many denoising iterations the sampler runs. More steps = more detail (with diminishing returns).

Denoising Strength

In img2img, how much the model changes the input. 0.3 = subtle, 0.8 = mostly new image.

Styles

Photorealistic

Style that mimics real photography. Use camera/lens specs, lighting, film stock for best results.

Cinematic

Movie-still aesthetic: dramatic lighting, anamorphic lens, color grading, depth of field.

Anime

Japanese animation style. Key prompts: studio Ghibli, makoto shinkai, cel shading, manga ink.

Concept Art

Pre-production art for games/films. Loose brushwork, dramatic compositions, mood-first.

Isometric

3/4 view with no perspective distortion. Popular for game art and infographics.

Lighting

Golden Hour

Soft warm light just after sunrise / before sunset. Long shadows, orange/pink tones.

Rim Lighting

Light source behind the subject creating a glowing outline. Dramatic and cinematic.

Volumetric Light

Visible light beams through atmosphere/dust. God rays, fog, spotlights.

Studio Lighting

Controlled multi-light setup: key, fill, rim. Professional product/portrait look.

Camera

Depth of Field

How much of the scene is in focus. Shallow DoF (f/1.4) = blurry background.

Bokeh

Aesthetic out-of-focus blur, especially circular highlights from lens aperture.

Wide-Angle Lens

24mm or below. Captures more of the scene, exaggerates depth, distorts edges.

Macro

Extreme close-up photography revealing fine detail. Insects, eyes, textures.

Composition

Color Grading

Post-process color adjustment for mood. Teal-and-orange, cool blue, warm sepia.

Rule of Thirds

Compositional grid placing the subject 1/3 from edges for natural balance.

Leading Lines

Lines in the image that draw the eye toward the subject — roads, fences, light beams.

Symmetry

Mirror or radial composition. Strong, formal, often architectural or surreal.

Workflows

ComfyUI

Node-based Stable Diffusion interface. Visual workflows, ultimate flexibility.

Automatic1111

Most popular Stable Diffusion web UI. Supports extensions, scripts, all models.

Upscaling

Increasing image resolution while preserving/adding detail. Real-ESRGAN, Topaz, SUPIR.

Concepts

Hallucination

When AI generates plausible but incorrect content (extra fingers, fake text).

Fine-Tuning

Training a base model on custom data so it learns your style or character.

Token

A word/subword unit the model processes. Most models cap at 75-225 tokens per prompt.

Latent Space

Compressed representation where diffusion happens. Models work here, then decode to pixels.

Diffusion

Process of starting from noise and iteratively denoising to form a coherent image.

Transformer

Neural network architecture underlying GPT, DALL-E, and the text encoders in image models.

Multi-modal

A model that handles multiple input/output types (text, image, audio, video).