Skip to content
Go back

Ollama Context Window

Published:  at  10:00 AM

Understanding and Configuring the Ollama Context Window

Large language models (LLMs) are powerful tools, but their ability to understand and respond effectively depends on something called the context window. This post explains what the context window is, how it works with Ollama, and how to adjust it for better results. This is particularly important because Ollama’s default context window is smaller than many models require for optimal performance.

What is a Context Window?

Think of the context window as the model’s short-term memory. It’s the amount of text the model considers when generating a response. This includes your prompt, any previous turns in a conversation, and any provided documents. The larger the context window, the more information the model can retain and use to create relevant and coherent answers. A small context window limits the model’s ability to understand complex instructions or maintain consistency over longer interactions.

How Does the Context Window Work with Ollama?

Ollama simplifies running LLMs locally. However, it’s important to understand that each LLM has its own context window size. For example, the Gemma 3 model boasts a context window of 128k tokens, while others may have smaller windows. Ollama’s default context window is surprisingly small at 2048 tokens. This low default can restrict the performance of models that can otherwise work with larger context windows.

Adjusting the Context Window in Ollama

  1. Using the API: You can set the context window size when making API requests. For example:

    const response = await fetch("http://localhost:11434/api/generate", {
        method: "POST",
        body: JSON.stringify({
            model: "gemma3:12b",
            prompt,
            options: {
                // Max ctx window of Gemma3 Model is 128k tokens
                num_ctx: 128000,
            },
        }),
    });
  2. Understanding Tokenization: It’s crucial to understand that the context window is measured in tokens, not words. Tokens are pieces of words or punctuation. Generally, one token is roughly equivalent to 3/4 of a word, but this varies depending on the model and the text. You can use online tokenizers to estimate the number of tokens in your text. This tokenizer is a useful resource.

Scenarios and Examples

Let’s look at a few scenarios where adjusting the context window is beneficial:

Limitations and Considerations

Conclusion

Understanding and configuring the context window is essential for maximizing the performance of LLMs with Ollama. While Ollama’s default context window is smaller than ideal for many models, adjusting it allows you to tailor the model’s memory to your specific needs. Remember to consider the RAM requirements and potential performance impact when increasing the context window size. By experimenting with different settings, you can unlock the full potential of your local LLMs.