Using threads vs chat completions

I am working with the assistants api to build some RAG chatbots, and I have a few questions that I haven’t been able to answer after some searching. Thanks for any help you can provide!

  1. How long are threads saved for, are they permanent?
  2. Is it possible to access logprob for a given token in a thread response, in the same way that it is possible using chat completions?
  3. If I wanted to customize a chat interface more, to return log probs (for showing confidence scores with each message or even token) and to prevent threads persisting, could I build my own thread by temporarily saving message history and passing it into my chatting interface using user:, assistant: roles? Or is there anything else ‘threads’ are doing beyond simplifying this process and saving threads?
  4. When file citations are returned by an assistant, what exactly does start_index and end_index refer to, is it the token positions of the cited text?

Inactive threads are purged after 60 days of inactivity.

Assistants does not offer logprobs.

Besides capturing the user input message and the observed assistant reply, a thread also is the container for tool calls and past tool responses, such as a call to file_search tool.

The indexes are no longer used by newer file_search. Although it was not documented, when using retrieval, where after a search the AI can also “browse” documents on demand, the AI could output to a tool method a range of line numbers from the document it was reading to save, and present as the response some “file objects”. The values of the “index” are somewhat impenetrable in relation to an original PDF.

1 Like

Thanks for your responses!

I don’t know if I understand your explanation regarding ‘browsing’ documents after search, but I’d be interested if there is anywhere else to read about file search outside of the docs and api reference? If there is any way to directly pull the text the AI is referencing with those indexes it would be awesome for my use case.

One other question - does changing the model for an assistant with file search change how file search is conducted or does it just change how the relevant chunks are synthesized? I have noticed since switching an assistant to gpt-4o that the number of references has gone up significantly for the same questions, so I assume some aspect of retrieval is altered?

Here’s the the AI retrieval working - working to return from an irrelevant document that is 1500 tokens long and somehow bill me over 5400 input tokens for making a single query that could return no relevant knowledge ($0.054 + $0.02) :

The text chunk that the AI was reporting on was elided with ellipsis at exactly 750 tokens, and then it never wrote the third instruction.

The AI understanding and comprehension will affect the search query written for the search, and also its ability to synthesize information from the large return. The chunking is already done with no intelligence used.

If you are curious about the actual instructions given to the Assistant AI about how to use the vector store search, here it is reproduced with how it must operate on the returned results:

## myfiles_browser

You have the tool `myfiles_browser` with these functions:
`msearch(queries: list[str])` Issues multiple queries to a search over the file(s) uploaded in the current conversation and displays the results.
please render in this format: `【{message idx}†{link text}】`

Tool for browsing the files uploaded by the user.

Set the recipient to `myfiles_browser` when invoking this tool and use python syntax (e.g. msearch(['query'])). "Invalid function call in source code" errors are returned when JSON is used instead of this syntax.

Parts of the documents uploaded by users will be automatically included in the conversation. Only use this tool, when the relevant parts don't contain the necessary information to fulfill the user's request.

Think carefully about how the information you find relates to the user's request. Respond as soon as you find information that clearly answers the request.

Issue multiple queries to the msearch command only when the user's question needs to be decomposed to find different facts. In other scenarios, prefer providing a single query. Avoid single word queries that are extremely broad and will return unrelated results.

Here are some examples of how to use the msearch command:
User: What was the GDP of France and Italy in the 1970s? => msearch(["france gdp 1970", "italy gdp 1970"])
User: What does the report say about the GPT4 performance on MMLU? => msearch(["GPT4 MMLU performance"])
User: How can I integrate customer relationship management system with third-party email marketing tools? => msearch(["customer management system marketing integration"])
User: What are the best practices for data security and privacy for our cloud storage services? => msearch(["cloud storage security and privacy"])

Please provide citations for your answers and render them in the following format: `【{message idx}:{search idx}†{link text}】`.

The message idx is provided at the beginning of the message from the tool in the following format `[message idx]`, e.g. [3].
The search index should be extracted from the search results, e.g. # 【13†Paris†4f4915f6-2a0b-4eb5-85d1-352e00c125bb】refers to the 13th search result, which comes from a document titled "Paris" with ID 4f4915f6-2a0b-4eb5-85d1-352e00c125bb.
For this example, a valid citation would be ` `.

All 3 parts of the citation are REQUIRED.

and then how a very siimple document chunk is formatted for AI understanding:

[4] # 【0†searchdata.txt†file-h0BXea8E8GiVu5oH10q3ZlmS】
[This is a chunk from a vector store]
Visible: 0% - 100%