Magical Context Memory for Chat API

I use the chat api to generate SQL queries which my (local) program sends to a (local) sqLite database. Each call to the api is standalone, i.e., no info from a previous call is included. My understanding is the models used this way are “stateless”.

However, I’ve had cases where the SQL query generated by a call includes data that exists only in my local database! I’m not talking about schema items such as table or column names. I mean specific data values (that were not present in the call sent to the api).

The data values are ones that resulted from a previous SQL query to the local database by my program, not by the API. It’s possible I included this value in a previous API call but if that is the explanation it means the API is not stateless, i.e., is providing an undocumented context memory or instant “training”.

How is this happening?

The API is stateless.

Perhaps if you have a more concrete example we will be able to explain where your confusion lies.

At least partially explained:

Chatting with GPT 3.5 1106 (using my own chat program based on the chat API):

=================Dialog start=====================================
[You]: what is the schema for imdb public dataset?
[GPT]: The IMDb public dataset schema includes various tables such as title_basics, title_ratings, name_basics, title_crew, title_principals, and more. Each table contains specific fields and attributes related to movies, TV shows, actors, directors, and other related information. The schema is quite extensive and includes details such as title type, primary title, original title, start year, genres, runtime minutes, average rating, number of votes, primary name, birth year, death year, profession, known for titles, and more. If you need more specific details about a particular table or field, feel free to ask!
[You]: what is an example nmconst value from the title_basics table?
[GPT]: An example of an nmconst value from the title_basics table in the IMDb public dataset is “nm0000131”. This value represents a unique identifier for a person (in this case, a specific actor, director, or other individual involved in the film industry) within the IMDb database. Each person in the IMDb dataset is assigned a unique nmconst value that can be used to reference their information across different tables in the dataset.
===========end of dialog==========================

The imdb public dataset is what my local sqLite db contains. Obviously GPT knows (independently based on its training) about the schema of that dataset and is able to pull an example piece of data. The question still remains: how does it pull specific values of nmconst that it could only know from previous function (sql query generation) calls? (As opposed to a random value as done in my example dialog above.)

GPT is definitely hallucinating.invalid words into the SQL queries it generates. The IMDB dataset has ID columns as follow:

For names. Column name = nmconst, data format = nm#######
For title: Column name=tconst, data format = tt#######

It inserts name values or title values that it has no way of knowing from the function requests I send it.

I completely refactored my local database as follows:

names column name = nxconst, value format = nx#######
title column name = txconst, value format=tx########

But the queries generated by GPT sometimes contain name and/or title values formatted in the original way, i.e., values that aren’t in my refactored database at all!

Is there any way using Temperature or another parameter setting to make it quit this?