AI Prompt Glossary
50 essential terms for image, video & prompt AI
Platforms
Midjourney
AI image generator known for artistic, painterly outputs. Now on v6/v7 with high prompt fidelity.
DALL-E 3
OpenAI's third-generation text-to-image model with strong text rendering and ChatGPT integration.
Stable Diffusion
Open-source text-to-image model by Stability AI. Highly customizable via LoRAs and ControlNets.
SDXL
Stable Diffusion XL — larger, higher-quality variant producing 1024×1024 images natively.
Flux
High-quality image model family from Black Forest Labs. Variants: Schnell, Dev, Pro.
Sora
OpenAI's text-to-video model producing minute-long high-fidelity video clips.
Runway
Video AI platform — Gen-2/Gen-3 models for text-to-video, image-to-video, motion brush.
Techniques
Negative Prompt
Tags telling the model what NOT to include. Crucial in Stable Diffusion to fix hands, blur, watermarks.
Prompt Weighting
Adjusting emphasis on specific tokens, e.g. (red dress:1.5) in Stable Diffusion.
CFG Scale
Classifier-Free Guidance — controls how strictly the model follows your prompt. Higher = stricter.
Seed
Random number that determines the noise pattern. Same prompt + same seed = reproducible image.
Inpainting
Masking a region of an image and regenerating just that area while keeping the rest.
Outpainting
Extending an image beyond its original borders, generating new content that matches the existing scene.
img2img
Using an input image + text prompt to guide generation, preserving composition.
Components
LoRA
Low-Rank Adaptation — a small trained file that adds a specific style or character to a base model.
ControlNet
Adds structural control: pose, depth, edges, scribbles. Essential for consistent character work.
VAE
Variational AutoEncoder — converts model latents to pixels. Different VAEs = different color/contrast.
Sampler
Algorithm that iteratively denoises images. Common: DPM++, Euler a, DDIM.
Checkpoint
A saved model file (.safetensors / .ckpt). Different checkpoints produce different aesthetics.
Embedding
Textual Inversion files that teach the model new concepts via short trigger words.
Parameters
Styles
Photorealistic
Style that mimics real photography. Use camera/lens specs, lighting, film stock for best results.
Cinematic
Movie-still aesthetic: dramatic lighting, anamorphic lens, color grading, depth of field.
Anime
Japanese animation style. Key prompts: studio Ghibli, makoto shinkai, cel shading, manga ink.
Concept Art
Pre-production art for games/films. Loose brushwork, dramatic compositions, mood-first.
Isometric
3/4 view with no perspective distortion. Popular for game art and infographics.
Lighting
Golden Hour
Soft warm light just after sunrise / before sunset. Long shadows, orange/pink tones.
Rim Lighting
Light source behind the subject creating a glowing outline. Dramatic and cinematic.
Volumetric Light
Visible light beams through atmosphere/dust. God rays, fog, spotlights.
Studio Lighting
Controlled multi-light setup: key, fill, rim. Professional product/portrait look.
Camera
Depth of Field
How much of the scene is in focus. Shallow DoF (f/1.4) = blurry background.
Bokeh
Aesthetic out-of-focus blur, especially circular highlights from lens aperture.
Wide-Angle Lens
24mm or below. Captures more of the scene, exaggerates depth, distorts edges.
Macro
Extreme close-up photography revealing fine detail. Insects, eyes, textures.
Composition
Color Grading
Post-process color adjustment for mood. Teal-and-orange, cool blue, warm sepia.
Rule of Thirds
Compositional grid placing the subject 1/3 from edges for natural balance.
Leading Lines
Lines in the image that draw the eye toward the subject — roads, fences, light beams.
Symmetry
Mirror or radial composition. Strong, formal, often architectural or surreal.
Workflows
Concepts
Hallucination
When AI generates plausible but incorrect content (extra fingers, fake text).
Fine-Tuning
Training a base model on custom data so it learns your style or character.
Token
A word/subword unit the model processes. Most models cap at 75-225 tokens per prompt.
Latent Space
Compressed representation where diffusion happens. Models work here, then decode to pixels.
Diffusion
Process of starting from noise and iteratively denoising to form a coherent image.
Transformer
Neural network architecture underlying GPT, DALL-E, and the text encoders in image models.
Multi-modal
A model that handles multiple input/output types (text, image, audio, video).