What is Context window?

Question

Accepted Answer

A context window is the maximum amount of text (measured in tokens) an AI model can consider at once when generating a response — the combined input prompt, retrieved context, conversation history, and the response itself. Through 2023, most production models had 8K–32K token windows; by mid-2026, frontier models routinely ship 1M–10M token windows, with Gemini and Claude variants at the high end and open-source models catching up rapidly. For marketing use, the practical effect is that an AI assistant can now hold an entire brand bible, the full last-quarter campaign archive, and the live brief in one conversation without retrieval being strictly required — which simplifies architecture but raises cost and latency per call. The strategic trade-off is straightforward: a long-context model is easier to set up (paste everything in) but costs more per generation; a RAG-based approach with a short-context model is cheaper at scale but requires retrieval infrastructure. By mid-2026, sophisticated marketing AI pipelines use both: long context for in-session iteration ('here's our last 30 days of campaign brief context, expand this one'); RAG for cross-session retrieval where the data is too large for any window. Understanding context-window economics is now table stakes for marketing leaders making AI-tooling decisions.

What is Context window?

How it relates to AI UGC

Key statistics

Related blog posts

Related terms