Ollama Context Window | TurtlePages Demo

Understanding and Configuring the Ollama Context Window

Large language models (LLMs) are powerful tools, but their ability to understand and respond effectively depends on something called the context window. This post explains what the context window is, how it works with Ollama, and how to adjust it for better results. This is particularly important because Ollama’s default context window is smaller than many models require for optimal performance.

What is a Context Window?

Think of the context window as the model’s short-term memory. It’s the amount of text the model considers when generating a response. This includes your prompt, any previous turns in a conversation, and any provided documents. The larger the context window, the more information the model can retain and use to create relevant and coherent answers. A small context window limits the model’s ability to understand complex instructions or maintain consistency over longer interactions.

How Does the Context Window Work with Ollama?

Ollama simplifies running LLMs locally. However, it’s important to understand that each LLM has its own context window size. For example, the Gemma 3 model boasts a context window of 128k tokens, while others may have smaller windows. Ollama’s default context window is surprisingly small at 2048 tokens. This low default can restrict the performance of models that can otherwise work with larger context windows.

Adjusting the Context Window in Ollama

Using the API: You can set the context window size when making API requests. For example:

const response = await fetch("http://localhost:11434/api/generate", {
    method: "POST",
    body: JSON.stringify({
        model: "gemma3:12b",
        prompt,
        options: {
            // Max ctx window of Gemma3 Model is 128k tokens
            num_ctx: 128000,
        },
    }),
});

Understanding Tokenization: It’s crucial to understand that the context window is measured in tokens, not words. Tokens are pieces of words or punctuation. Generally, one token is roughly equivalent to 3/4 of a word, but this varies depending on the model and the text. You can use online tokenizers to estimate the number of tokens in your text. This tokenizer is a useful resource.

Scenarios and Examples

Let’s look at a few scenarios where adjusting the context window is beneficial:

Long Conversations: If you’re having a lengthy conversation with a model, a larger context window helps it remember earlier parts of the discussion, leading to more consistent and relevant responses.
Document Processing: When providing a model with a large document to summarize or answer questions about, a larger context window allows it to consider more of the document’s content.
Complex Instructions: If you’re giving the model a series of complex instructions, a larger context window helps it keep track of all the steps.
Code Generation: When generating code, a larger context window allows the model to consider more of the existing codebase, leading to more accurate and consistent results.

Limitations and Considerations

RAM Requirements: Increasing the context window size significantly increases the amount of RAM required. If you don’t have enough RAM, the model may run slowly or crash.
Performance: Larger context windows generally lead to slower response times. The model has more data to process.
Model Compatibility: Not all models support very large context windows. Check the model’s documentation to see its maximum supported context size.
Cost (for cloud-based models): While this isn’t directly applicable to Ollama (which runs locally), it’s worth noting that larger context windows often increase the cost when using cloud-based LLMs.
Relevance Decay: Even with a large context window, information at the beginning of the context may be less influential than more recent information. This is known as “relevance decay.”

Conclusion

Understanding and configuring the context window is essential for maximizing the performance of LLMs with Ollama. While Ollama’s default context window is smaller than ideal for many models, adjusting it allows you to tailor the model’s memory to your specific needs. Remember to consider the RAM requirements and potential performance impact when increasing the context window size. By experimenting with different settings, you can unlock the full potential of your local LLMs.