Adding my own data to ChatGPT

tomerg3 · September 25, 2023, 6:47pm

I would like to “train” or provide ChatGPT access to my ecommerce order history (which I will export to any format needed).
I would like to ask it to analyze the data and generate all sorts of reports.
Due to the size of the data, doing prompts and even functions within the prompts is not a viable solution.
I was reading about fine-tuning, but my understanding of the documentation is that it’s not designed to load data, but rather load specific question/answer prompts (which is not the same as loading the order total and products purchased from 100 orders.
A Google search yielded some results about Azure offering these capabilities.
Has anyone done something like this, and if so, what tools / service did you use?
Thanks!

vb · September 25, 2023, 6:50pm

Have you tried using the Code Interpreter/ Advanced Data Analysis using the ChatGPT4 model?
This sounds like a perfect use case for uploading the orders on any format you want and have it create reports based on your requirements.

tomerg3 · September 25, 2023, 6:55pm

I have not, as I don’t necessarily want to use GPT4.
Also, it is my understanding that the API does not have access to those tools.
I’m curious though, does it allow you to upload files with the data?
Can the data be separate? If I want to load the data from 2 shops, I would like to keep each separate, so I could query orders from shop A and not get any data from shop B

vb · September 25, 2023, 6:59pm

Yes and yes.
I did suggest it because it’s really straightforward.
Make sure to turn off training when uploading the company information though.

tomerg3 · September 25, 2023, 7:03pm

Thanks, I can’t find anything about using it in the API documentation, is it available through the API, if so, can you link the docs?

vb · September 25, 2023, 7:12pm

No, it’s the web app at chat.openai.com for plus subscribers.

tomerg3 · September 25, 2023, 9:42pm

I see, well I need a solution for the API, not the web interface

wclayf · September 25, 2023, 11:06pm

I don’t know of a way to load large amounts of context like you’re talking about. I think it’s probably not possible.

However, one approach would be to describe your database table structure, in a prompt, and then describe the kind of analysis and reports you need to generate, and ask it to generate either the SQL or both the SQL and code to generate those reports. Theoretically it wouldn’t need to see the actual data in order to be able to create code/sql that can be used to analyze it.

But of course you don’t want to just arbitrarily run any generated SQL. You’d just be using ChatGPT as a developer assistant to write the SQL. I know this is not precisely what you were asking for, but I hope it helps.

tomerg3 · September 26, 2023, 12:38am

Thanks, but that’s not really helping me.
I need to load the data rather than run queries on it (it’s not a DB, but rather an API, and the API queries are limited)

wclayf · September 26, 2023, 1:25am

Yeah, that’s why I said it wasn’t a solution to your scenario, but a discussion of other options.

vb · September 26, 2023, 5:09am

You can try the Jupyter notebook integration:

This should allow to load the data and then chat with the AI about it while creating your regular reports. In general creating the reports can then be further automated.

Hope this helps!

_j · September 26, 2023, 6:32am

You can let the AI access the API or a middleman service you create, by specifying functions. Functions are a way of making queries in order to better answer questions or to fulfill tasks.

To answer your question, an AI might use a “company_documentation_index” to get categories, then iterate on “company_category_files”, and “company_file_sliding_window_access” or whatever other tools are provided for it to explore and retrieve info. Or just a “search our stuff” function.

RainbowDolly · September 26, 2023, 7:07am

It sounds to me that you want to generate complex reports of complex data with just using GPT to me. This is not going to be possible. You will have to do some programming, data preparation, etc. and depending on the nature of the reports, not using AI at all may even be the better solution but that is hard to tell without knowing a specific use case.

If it’s just about being able to query the data in natural language, I suppose you could load all the data from the API, use GPT to translate a query into code, run that code against your local in-memory copy, then generate the report.

wclayf · September 26, 2023, 3:30pm

To clarify this idea. Isn’t it true that GPT can only decide what functions it needs to call, and can reply back with the function names, and function arguments, as the response, but it’s not able to actually “call” any functions itself right?

tomerg3 · September 26, 2023, 3:38pm

I appreciate all the suggestions to use functions, but I am not interested in using functions, as I some of the data I’m trying to query with in not searchable through an API.
I’m also not interested in creating my own local DB and running queries from it.
I keep reading that with an Azure subscription you can add your own data, and while I have not tried it yet, the suggestion to use the Advanced Data Analysis seems like it could work, but I need it through the API.

tomerg3 · September 26, 2023, 3:39pm

Yes, that is a true statement.
This is in fact a complete statement.

_j · September 26, 2023, 4:10pm

1. What you program:

system: “You are a chat bot. Your pretrained knowledge cuts off at September 2021. You’re gonna need to use the internet to find newer information.”

function_list=[
    {
        "name": "google_for_answers",
        "description": "Search Google for more information, receiving back top result summaries.",
        "parameters": {
            "type": "object",
            "properties": {
                "query": {
                    "type": "string",
                }
            }
        }
    }
]

2. What the user inputs:

What movie won the 2022 Oscar for best picture?

3. What the AI outputs, a “function_call”

{
  "index": 0,
  "message": {
    "role": "assistant",
    "content": null,
    "function_call": {
      "name": "google_for_answers",
      "arguments": "{\n  \"query\": \"2022 Oscar winner for best picture\"\n}"
    }
  },
  "finish_reason": "function_call"
}

4. Do what the AI wants. Return the results in a new role:

Long code including the text from the Google API you used

function_return=“”"
Oscars 2022 Winners: See the Full List Here (Updated)
Vanity Fair
https://www.vanityfair.com › Hollywood › awards
Mar 28, 2022 — CODA wins best picture, Jessica Chastain takes best actress, and Will Smith nabs best actor at a thoroughly unpredictable Oscars.

94th Academy Awards
Wikipedia
https://en.wikipedia.org › wiki
CODA won three awards, including Best Picture. Other winners included Dune … In July 2022, the broadcast was nominated for three awards at the 74th …
‎Winners and nominees · ‎Awards · ‎Ceremony information · ‎Critical reviews

Oscars 2022: the full list of winners
The Guardian
https://www.theguardian.com › film › 2022 › mar
Mar 29, 2022 — Best actress. Jessica Chastain (The Eyes of Tammy Faye) – WINNER! Olivia Colman (The Lost Daughter) Penélope Cruz (Parallel Mothers)

2022 Oscars winners full list, CODA wins best picture
Los Angeles Times
https://www.latimes.com › awards › story › 2022-03-27
Mar 27, 2022 — The complete list of 2022 Oscar winners … Will Smith holds his Oscar for best actor for “King Richard” during the show at the 94th Academy …

Oscars 2022: ‘CODA’ wins best picture, but Will Smith slap …
ABC7 Los Angeles
https://abc7.com › oscars-2022-oscar-winners-will-smith-…
Mar 28, 2022 — “CODA” was named best picture after one of the most shocking moments in Oscars history: Will Smith slapped Chris Rock onstage.
“”".strip()

response = openai.ChatCompletion.create(
messages=[{“role”: “system”, “content”: ("You are a chat bot. "
"Your pretrained knowledge cuts off at September 2021. "
“You’re gonna need to use the internet to find newer information.”)},
{“role”: “user”, “content”: “What movie won the 2022 Oscar for best picture?”},
{“role”: “function”, “name”: “google_for_answers”,
“content”: function_return}],
model=“gpt-3.5-turbo”, max_tokens=200, temperature=0.2, functions=function_list)

5. The AI now decides to answer instead of calling more functions:

{
  "index": 0,
  "message": {
    "role": "assistant",
    "content": "The movie \"CODA\" won the 2022 Oscar for Best Picture."
  },
  "finish_reason": "stop"
}

(actual execution of function I just ran manually)

So you are correct, there is no code provided by OpenAI to make the function magic happen. You program and provide the operational function that can “do_actual_math” or “drop_sql_tables”.

wclayf · September 26, 2023, 4:55pm

I think you skipped a step, where you take function_return that you got by running your own automated google search of some kind and then embedded the results into your next query. Right? It looks like you’re saying just because you ran your own search that suddenly now ChatGPT knows the answer also. lol. I guess you assumed we’d fill in that missing detail.

_j · September 26, 2023, 4:56pm

Yes, I copy-pasted Google. You program an API to fulfill the purpose of the function, such as data retrieval.

wclayf · September 26, 2023, 4:59pm

Right, but I’m saying once you get the data back (from google) it becomes your [context], and so you’d need to submit another GPT query something like:

"Using the context below, answer the question [question]:

Context: [context]"

If people don’t already know how function calling results need to be used they’d need to see that final prompt to fully “get it”.

Topic		Replies	Views
LLM forgetting part of my prompt with too much data Prompting chatgpt , prompt	17	10876	May 25, 2024
OpenAI API Manufacturing and Industrial Use Cases API	63	8916	April 5, 2025
SearchGPT availability in API? API searchgpt	42	18190	March 26, 2025
Chat completion api tool call loops API api , tools	15	1507	August 6, 2024
Send me your GPT problems, I'll solve them for free and make a YouTube video Community	77	8218	January 3, 2024