Using API with a CSV huge table of contents

touchofred · June 20, 2024, 9:08pm

Hey guys,

I’ve searched a little and found this:

As I’m working on the same concept, I’m trying to feed the OpenAI API Key with a huge table (CSV) that is about 1322168 tokens. There’s no point in sending this amount of tokens per every request (I think that the TPM is 60K).

How can I save or store this huge table in the memory of the GPT? It’s like giving it pre-instructions or having it as a context for any future response.

Thanks!

_j · June 20, 2024, 9:12pm

There is no “memory” to an API language model AI. You provide the entire context from which it must answer into its limited context length.

For a case like that, you will likely need to provide a database query tool along with some education or even fine-tuning of how to make queries to obtain the desired filtered data.

touchofred · June 20, 2024, 9:34pm

Thanks @_j I’d like to ask for some clarifications if I may.

For a case like that, you will likely need to provide a database query tool along with some education or even fine-tuning of how to make queries to obtain the desired filtered data.

database query tool = Framework? like SQL, PostgreSQL, etc?
education = D’you mean, a set of pre instructions how to use the SQL?
fine-tuning = likewise, a set of pre instructions or literally training the model with some samples of obtaining the data through queries?

Thank you so much.

_j · June 20, 2024, 9:45pm

Since you can’t load three million tokens into an AI model, and it wouldn’t be able to “compute” to obtain any realistic results, you instead would need some way to allow the AI to obtain just part of that data as knowledge, where a partial return can answer a desired question.

That is done through function-calling. Offering the AI model a function specification, such as the fields that a database contains and the queries it can issue upon those fields.

That can be a straightforward tailored function that is meant to return particular data (car year and model, with a table of safety ratings, for example). It can be more open-ended, like a SQL query, but the more you rely on the AI to do something novel, the more it may make errors that don’t return the intended data to answer from.

https://platform.openai.com/docs/guides/function-calling

touchofred · June 20, 2024, 9:50pm

Thanks @_j Great answer.
Just to better understand the logic. If that’s how it works under the hood, then how is Code Interpreter feature of OpenAI is different?
Does it also get functions to refer data to the proper programming language or framework?

_j · June 20, 2024, 10:17pm

Code interpreter allows the AI to write python code which can be executed within a Jupyter notebook.

This can allow it to process data with the code it writes. For example “sample and return the first 10 lines of openai_pricing.csv with python code to see the data format, then produce more python that will process that into a markdown table saved as pricing.md” would give multi-step turns for the AI to perform.

However, what can be returned to the AI by Python execution by OpenAI is more limited, currently 32k characters. That also limits understanding.

The function-calling is just having the AI emit an output similar to an API call to your own code. You must handle the tool_call, performing the task that the function specification promised to the AI, and returning the data or status of the function execution.

Topic		Replies	Views
Strategies to feed GPT4 bot with product database to maintain context API	13	1139	December 17, 2023
How to teach a model relational data? API assistants	10	819	July 17, 2024
Need more than a 4097 token call from chat gpt api API	7	3137	November 28, 2023
Best way to achieve this Data Analysis use case? API api , custom-gpt	4	312	November 13, 2024
Assistants API Cost Exceeds Reasonable Expectations API gpt-4	4	965	April 11, 2024

Using API with a CSV huge table of contents

Related topics