What does start_index & end_index actually refer to in Assistant Retrival?

Does anyone know what the number of start_index & end_index actually refers to in assistant retrival?

I thought it refers to character level indexes of the txt file I uploaded to the assistant, but turns out they are not matching…

Can’t find anything relevant in docs also

1 Like

I think it refers to the charaters in the response message.

i.e. The [source] is in the characters spanning the start and end

1 Like

I think it might refer to the actual token position. So use something like gpt-3-encoder to determine the exact position

Tokens is more likely actually. Haven’t tested. Did you happen to verify this?

Assistant API Annotations - #3 by TemplarRush based on the answer from @nikunj here it looks like " They represent the start and end index of the string to be replaced (amet?6:4+source? ) in the response text generated (text.value )." and nothing to do with where the LLM got the input from :frowning:

While there isn’t an official way to do this yet. the following workaround worked for my use case - # Tip: Making Assistants API return better annotations (besides file names)

IMO: It indicates the placement of references in the response (e.g., something like 4:4†source】is at the end of the response text. Practically useless!
The more important one is the file_id which you gotta map back to the exact filename using a function later on…