Is it possible to preload context in open ai?

Is it possible to preload context in open ai only once. I then only want to provide history to get answer. I can’t find this in docs
I have a CSV file and want to find some text them by multiple querists.

1 Like

Hi @Alex_Anonym !

If you (1) want to do this purely with API (so not using ChatGPT or custom GPTs), (2) don’t want to use Assistants since that comes with its own challenges, (3) your CSV is not very large, i.e. hundreds to few thousand rows, with a dozen columns or so, and (4) you don’t expect huge number of requests every minute, then:

Easiest approach is to just use GPT-4.1 since it comes with 1 million token context window, preload the CSV in system prompt every time, and serve it via standard ChatCompletions or Responses API. It will have a high likelihood of being cached, so you would be paying (roughly) 50 cents per million tokens on the input.

If you want to do this “properly”, then since this is a CSV, I am assuming a relational structure, so in that case it would be better to build a small database, populate it with your data, and then have a text-to-SQL tool call.

3 Likes

Semantic similarity search is terrible on tabular data like a CSV, and then you don’t even have a header to inform what the AI is looking at.

Every API call is inherently stateless.

As you’ve likely discovered, if you want to receive inference based on an input, even if that input is employing the same system message or context such as a document, you have to provide that input again with your alternate “question” about it.

There are a few mechanisms you can exploit because of the repetitive nature of the task.

1. Cache

OpenAI offers a discount on that part of input that has been recently used before. This means that if you have an unchanging document and system task, repeated queries against that in a short time can have the input discounted by 50%, or on just the gpt-4.1 series models, by 75%.

Since this relies on a first “hit” that is not discounted, and the cache also has minimum matching requirements (where the start of at least 1k tokens is not altered), and then that the server-side cache is not 100% guaranteed and expires, you can also simply submit the job to “batches” or “service level: flex” and also receive a 50% discount on the whole job, input and output, guaranteed.

2. Stateful storage with Responses

The responses endpoint allows you to store and reuse a previous response ID, and this can be used multiple times. It doesn’t actually change how the AI model works, but can be a mechanism you use to form a “chat”, and then not need to send that over the network again. It doesn’t have any cost benefit, and you’d also have to offer some initial input, like “Here is my CSV, just acknowledge you are ready to answer questions about it by answering ‘OK’”

Without full context…

It sounds like you could want full observation of the CSV file, and not some search done on it. Providing the full CSV to the AI can do that, where it can answer any question across all the knowledge at once. However, you might consider loading the actual knowledge into a database directly, with the fields intact. Then you can present a tool to the AI, where it can actually make queries, such as “customer: startswith(“smith”)” + date(after: 30 days ago).

1 Like