Assistants persist between sessions? And using the retrieval plugin, ie. getting assistant to make api calls to it

nyck33 · February 11, 2024, 12:46pm

The Assistants API docs are unclear if Assistants persist between sessions, I mean if I create one, am I issued some sort of Id number that I send in subsequent requests say two weeks from now and my files and instructions will be there? If you know the answer, can you post a link to the documentation that says this? I came here after noticing that the retrieval plugin cannot be used by the Assistants API but is that true? The retrieval plugin GitHub repo Readme is claiming that I can use it but I don’t see how.
Can you also explain that?

_j · February 11, 2024, 1:16pm

The creation of an “assistant” lasts indefinitely. The same with files uploaded to the repository, OpenAI doesn’t give any limit to the lifetime.

When files are attached to an assistant to be used with its own knowledge retrieval function, however, you are billed per day, priced per GB, per assistant for the files attached.

Assistants have an ID and a name you use for further interactions.

A conversation history and new user input is contained in a thread. That has its own ID that you must remember in your code, and will likely want to remember or store other metadata also, like a chat title. Inactive ones are purged after 60 days.

The process of a user asking a question is to add the newest message to a thread. One then specifies the assistant ID and the thread ID in a “run” to have an assistant agent do its autonomous answering (and billing). Then you wait and continue checking to see if an answer is ready.

There is no “plugin” required. The operation of the built in assistant retrieval which you enable by attaching files and specifying tools=[{"type": "retrieval"}] is hidden from you and not open source or on github.

nyck33 · February 11, 2024, 2:18pm

It’s almost like a Custom GPT via API is what it sort of sounds like.

Except I don’t like: " [

Managing Threads and Messages

](https://platform.openai.com/docs/assistants/how-it-works/managing-threads-and-messages)

Threads and Messages represent a conversation session between an Assistant and a user. There is no limit to the number of Messages you can store in a Thread. Once the size of the Messages exceeds the context window of the model, the Thread will attempt to include as many messages as possible that fit in the context window and drop the oldest messages." So the messages kept is limited to the token limit of the model? Because with Custom GPTs with an Action that makes calls to the /query, /upsert and /upsert_file endpoints of my ChatGPT Retrieval Plugin running on the cloud acting as the interface for my Supabase pgvector databases, then there basically is no limit to what’s stored is that correct? For example in an idealistic situation my Flutter frontend makes a query to ChatGPT → ChatGPT makes a request to the /query endpoint of my Retrieval Plugin → Retrieval Plugin FastAPI makes a query to my pgvector database → function on pgvector database does its work and returns the top k matches to the FastAPI server → FastAPI server returns these results back to ChatGPT and for this question asked to ChatGPT, these returned results would not count towards the token limit is what I am assuming which makes Actions on Custom GPT’s much more capable than the Assistant’s API in terms of being able to retain a larger “memory”. But there is function calling in Assistant’s API which I can trigger so that the thread blocks until I make the api call to my Retreival Plugin FastAPI, get the top k results and return those to the thread as “results of caling the function” which would seem to mimic the functionality I am looking for, ie. super large memory.
Is the above correct?

jlvanhulst · February 11, 2024, 3:32pm

To add to Jay’s comment - you should not create an Assistnat eveytime. You can update Assistants programmatically OR also in the backend (platform.openai.com) where you can play with them in the Playground as well to get a feeling.

_j · February 11, 2024, 4:46pm

If you use functions (tools), you don’t have to supply large data to the AI. However, it still can act iteratively, calling things again (which is how retrieval works internally) and chat history will grow, keeping context of past thread loading by tools. All without management by you. So the sky’s the limit pretty quickly, over $1 per gpt-4 iteration. It is not for those who don’t want to OpenWalletAI.

nyck33 · February 12, 2024, 1:13am

The same thing could have been done if OpenAI had just made the Actions available via API at a much lower cost probably. This does seem like something that OpenAI came up with to keep the ecosystem of data stores, etc within OpenAI, ie. a greedy move that only hurts them in the long run as now I am going to look elsewhere for open source or other solutions that wll let me instruct my LLM to make API calls out to Supabase.

_j · February 12, 2024, 2:59am

No API model or backend can access the internet. As developer, you must form the bridge between AI tool and API, which thus can run efficiently, with no need to make additional network requests.

nyck33 · February 12, 2024, 3:06am

ChatGPT can search the web though so something like this should be possible. As can Actions. My use-case is to make reading comprehension questions for students for long novels like War and Peace, Crime and Punishment for example for which traditional RAG, ie. stuffing content into the prompt is not going to be enough. So I want the backend, say ChatGPT to be able to iteratively make requests to the database interface to build context.

_j · February 12, 2024, 3:26am

It is certainly possible. The search just has to come out of your API requests and AI responses, not from OpenAI accessing network resources.

I would actually say it is easier to “program” functions than to write using another’s specifications made for generic applications. I can jump right in and write functions for the fun of it. Write code interfacing directly with your in-memory vector database (this is python list/dictionary, not function parameters, not json)

# An empty tool list we can add our tools to on-demand
toolspec=[]
# And add the first
toolspec.extend([{
        "type": "function",
        "function": {
            "name": "get_book_list",
            "description": "Returns a list of all books from which you can search or retrieve pages.",
            "parameters": {
                "type": "object",
                "properties": {
                    "maximum_results": {
                        "type": "number",
                        "description": "Limits number of results. Limited titles are provided randomly",
                    },
                },
                "required": []
            },
        }
    }]
)

toolspec.extend([{
        "type": "function",
        "function": {
            "name": "search_book",
            "description": "Returns semantic search results for long text passages within book, and page number.",
            "parameters": {
                "type": "object",
                "properties": {
                    "book_title": {
                        "type": "number",
                        "description": "the exact book index number obtained from book list",
                    },
                    "query": {
                        "type": "string",
                        "description": "write 20-100 words of text similar to what you are looking for",
                    },
                },
                "required": ["book_title", "query"]
            },
        }
    }]
)

toolspec.extend([{
        "type": "function",
        "function": {
            "name": "get_book_page",
            "description": "Returns entire page number contents of book specified. Multi_tool_use supported.",
            "parameters": {
                "type": "object",
                "properties": {
                    "book_title": {
                        "type": "number",
                        "description": "the exact book index number obtained from book list",
                    },
                    "page": {
                        "type": "number",
                        "description": "page number, use search results or even use randomly",
                    },
                },
                "required": ["book_title", "page"]
            },
        }
    }]
)

The AI can emit parallel tool calls if it wants, for example, retrieving many pages at once from my page function in one API response.

nyck33 · February 12, 2024, 4:39am

Can I confirm that those are tools I would add to a ChatGPT assistant and then I would send it a query like “What happens tothe main protagonists in War and Peace near the end of the book?”, Then the Assistant would return to me the parameters for the second function (I would have all three functions implemented locally on my machine), ie. ChatGPT would use its knowledge of War and Peace to come up with a good query like you have here: "query": { "type": "string", "description": "write 20-100 words of text similar to what you are looking for", }, which I can then use to query my vector database from my local code? I’m not sure if this is correct but if you can correct me and show me a sample workflow if it isn’t. Also I have noticed that ChatGPT does not have knowledge of every single published book which surprised me since many companies are going after them for copyright violations, I thought OpenAI would have scanned every book that is not in Ebook form to use as training data but apparently they have not. So what happens in cases like this when ChatGPT has no idea what book I am talking about?

_j · February 12, 2024, 5:37am

First: ChatGPT is OpenAI’s web chatbot, the one with a $20/mo subscription to Plus, a prebuilt application and not your development. Where custom "GPT"s, and “actions” live.

The API has language models that take input (that can be structured) and produce a response (a response to a user or other, more programmatic, output)

Assistants on the API is a case of an agent, that, similar to GPT, can make iterative calls itself and keep its record of past interactions, instead of every AI output being handled by your code. Assistants also have a retrieval feature that can search knowledge files you uploaded, but it is not application-specific – or coded by you.

While the AI certainly can answer about literature, more of its contemplation will be by hundreds of other references to Huck Finn rather than “reading” and comprehending the text of Twain.

While what I wrote is just a 10-minute example that could suit your application, I provided the initial entry point of knowledge of your database:

get_book_list - Returns a list of all books from which you can search or retrieve pages.

You said you want students tested on reading comprehension of books. With your books on tap in the database, the students won’t be evaluating an AI’s writings.

The AI can answer that type of question on it’s own. You don’t want it reading the whole book token-by-token on your dime…and it likely wouldn’t.

Let’s ask it about using my function though:

Functions are user input driven. So if you instead said, “from a passage about the plight of protagonists near the end of War and Peace, create a 10-paragraph reading comprehension quiz. Your response will be the passage to be read, along with four AI-created multiple-choice questions”, the function calling would spring into action, finding

the book title ID if you have it by function call,
doing some searches to get results and their page numbers by function call, and
perhaps taking in multiple pages by function call until it has a suitable passage.

You maintain a local chat history, to which you add what the AI requested, and send the AI back the results (special messages).

The creation of applicable functions, both in AI specification and code, is up to you. Hopefully I’ve illustrated just the type of thing you can write.

That also needs some set up of the overall mission of the AI in a system message, where its identity, operation, goals, output format, behavior, style, etc, are defined.

yajoce6024 · February 12, 2024, 6:00am

I am unsure of how this means. If you mean the endpoint assistant you could include previous threads into the current history to make the model aware but any data outside of training data must be sent either in the current prompt or as part of conversation history if you want model to be aware.

Stateless means it doesn’t understand what it says. Or no memory means unaware.

yajoce6024 · February 12, 2024, 6:07am

These answers are just copy paste or you have experience with? Cause you kind of sound like a twat and people are just asking for help. The way this company launched there product and publicized is a little confusing. EVERYTHING IS CALLED THE ASSISTANT so depending on when people get involved the official documentation is a little confusing. Even chatgpt can’t understand what reference people are making when saying assistant and gpt.

Making people feel bad when asking for help is not alignment man. Not alignment.

_j · February 12, 2024, 10:18am

Is it less twatty if I pasted text and represented it as my own, or if I wrote bespoke definitions specifically for the creator of this topic? Because the latter is the case; every letter above is me pressing a key on my keyboard. I’ve made this usage clarification before for others though, especially about “ChatGPT”.

I understand your confusion. It is likely in your circles, you don’t often encounter so much articulate text at once that didn’t require great effort, leading to the impression of a paste.

The purpose of including a mini-glossary is to ensure we can communicate successfully about OpenAI products and the company’s nomenclature.

The focus was on leaving all the words “ChatGPT”, “plugins”, “actions” behind when switching to or employing the API, which doesn’t include use of any of those words.

There are really only two uses of the word assistant in OpenAI, although the natural English meaning is so broad that it should not have been used again for the DevDay autonomous AI agent:

assistant is the separator in language model inference that denotes where the user input ends and the AI is trained to then write a response. It was chosen for the ChatML format imposed on users of the chat completions endpoint, and is not seen, except when sending past chat history that replicates what the AI previously wrote. The AI seeing “assistant” is the actual prompt that lets it know that the next token to generate is not text continuation but rather a response to a user.
Assistants is the name of the OpenAI agent feature on API, “agent” being a term already widely used in AI for self-contained multi-step processes.

I hope that with understanding, by absorbing what I wrote instead of being angry at it, there will be less confusion and frustration with the world.

PS what does “not alignment man” mean? Perhaps you meant to write “not cool”, or “not nice”.

Topic		Replies	Views
Assistants API - Must Recreate & Reinstruct Every Time? API plugin-development , api , assistants-api	5	106	August 18, 2024
Are uploaded files permanently available when using the ASSISTANT API? API api	1	610	December 8, 2023
Are Assistant Parameters Ephemeral? API	2	284	February 17, 2024
Understaindg the Assistant process API	1	353	February 7, 2024
Clarification on Assistant Creation: Does Every User Generate an Assistant with OpenAI Retrieval Calls? API api	6	1789	January 25, 2024

Assistants persist between sessions? And using the retrieval plugin, ie. getting assistant to make api calls to it

Managing Threads and Messages

Related topics