I am using the assistants with retrievals. Does anyone know how to map back the annotations back to the source?
I’m currently trying with a json file but so far the information like the quote or the start and end indices aren’t very useful in locating it in the original json I uploaded.
I’m having the same problem I think. I understand the Assistants API is still in Beta. So, perhaps this needs to be a feature request. I see something like this in the annotations response:
"annotations": [
{
"type": "file_citation",
"text": "【11†source】",
"start_index": 613,
"end_index": 624,
"file_citation": {
"file_id": "file-G8tCTIryxew2lZVn3h3GhTpF",
"quote": "nadal-confirms-a-return-to-tennis-at-the-brisbane-international-20231201-p5eoha.html?ref=rss\",\"description\":\"The 22-time grand slam champion has not played a competitive match since bowing out in the second round of this year’s Australian Open"
}
}
Not sure about the OP, but my JSON is an array of objects. So, I would like to get a citation either of the index of the object in the array or an “id” property of the object in the array, e.g.
"annotations": [
{
"type": "file_citation",
"text": "【11†source】",
"index": 3
}
or,
"annotations": [
{
"type": "file_citation",
"text": "【11†source】",
"id": "asdf"
}
Having the same problem. I asked chatGPT4 and it said that the start and end index should relate to the characters extracted from (in my case) a pdf. So before I uploaded the pdf, I went through it page by page and recorded the start and end index for each page based on counting the characters on the page. This did not work… Later when I made a request to the assistant, it would always give back indexes that matched the first page. I would also ask a follow up question: ‘what page did this source come from?’ … that also didn’t work. The answers to that were never quite right. One more thing. The quotes that came back with the annotations seemed that they were sometimes not exact. I tried using the quotes to find the text within the pdf and I could sometimes, but not always.
Have you got this to work yet? I was trying to retrieve the exact text using the annotations as well and I came to the conclusion that the start and end index refer to the response text part(not the original file) where annotations were used to answer
I can’t even get an API search to see files?
Would you be against sharing what you’re doing to make this work?
I have:
- Created an Assistant in Playground with a Vector Store that has 1 File.
- Via the API, created a Thread and attached a Vector Store with 1 File
- Create a Run and asked “What files can you see?”
I’ve tried various combinations of providing tools and tool_resources params but no matter what I try I cannot get the Run to return a result with any other messages than “Can’t find any files”.
I didn’t find the docs to be very obvious. This is how I got it to fetch the file snippets used for generation in JS. Can’t link the repo here, but these are the relevant code snippets:
Creating the vector store:
const vectorStore = await openai.beta.vectorStores.create({
name: "Q&A List",
});
const readStream = createReadStream(process.cwd() + '/files/answers.json', 'utf8');
const readStream2 = createReadStream(process.cwd() + '/files/kbase.md', 'utf8');
await openai.beta.vectorStores.fileBatches.uploadAndPoll(vectorStore.id, {files: [readStream, readStream2]})
const assistant = await openai.beta.assistants.create({
model: "gpt-4o",
instructions:
"You are a helpful customer support agent for our car sharing service. You answer questions based on the files provided to you. If you can't answer a question, ask the user for an email address, forward the request to a human, and tell the user that someone will get back to them.",
tool_resources: {
"file_search": {
"vector_store_ids": [vectorStore.id]
}
},
tools: [
{ type: "file_search" },
{
type: "function",
function: {
name: "call_human",
description: "Forward the customer request to a human supervisor to handle the case if I don't know the answer",
strict: true, //Structured Outputs
parameters: {
type: "object",
properties: {
email: {
type: "string",
description: "The email address of the user that a response will later be send to by a human"
},
conversation: {
type: "string",
description: "The part of the conversation with the customer that I can't answer"
}
},
additionalProperties: false,
required: [
"email",
"conversation"
]
}
},
},
],
});
Fetch the file search results:
We forward them to a human supervisor for review
switch (toolCall.name) {
case "call_human": {
const messages = (await openai.beta.threads.messages.list(newMessage.threadId)).data;
const messagesWithRunSteps = await Promise.all(messages.map(async (msg) => {
if (msg.run_id) {
const run = await openai.beta.threads.runs.retrieve(newMessage.threadId, msg.run_id);
const steps = (await openai.beta.threads.runs.steps.list(newMessage.threadId, run.id, {}, { query: { "include[]": "step_details.tool_calls[*].file_search.results[*].content" } })).data;
return { ...msg, run_obj: { ...run, step_objs: steps } };
}
return msg;
}));
const email = parameters.email
const conversation = parameters.conversation
try {
const gotoHuman = new GoToHuman({apiKey: process.env.GOTOHUMAN_API_KEY, agentId: "com.gotohuman.demos.chatbot", agentRunId: newMessage.threadId, fetch: globalThis.fetch })
await gotoHuman.requestHumanApproval({
taskId: "provideAnswer",
taskName: "Provide response",
taskDesc: conversation,
completedTasks: [{type: "openai_conversation", result: messagesWithRunSteps, taskName: "Customer Conversation"}],
actionValues: [{id: "email", label: "Customer Email", type: "text", text: email}, {id: "answer", label: "Your response", type: "text", text: ""}],
});
return {
tool_call_id: toolCall.id,
output: `Request was successfully forwarded to a human`,
};
} catch (error) {
return {
tool_call_id: toolCall.id,
output: `The request failed!`,
};
}
}
default: {
return {
tool_call_id: toolCall.id,
output: `unknown function: ${toolCall.function.name}`,
};
}
}
This way we can show the cited snippets and their score, which is pretty helpful in reviews