Does the index field on an embedding response correlate to the index of the input text it was generated from

In an embedding response each embedding has an index field. I am trying to match up each item of the input text array to the embedding that was created. However I am unsure how to do this. Do the embeddings come back in the order they were sent in ? do the indexes correlate to the index of the text in the original input array ?

Any insight on this would be a massive help, Thank you !

Hi and welcome to the Developer Forum!

Can you post where this index is coming from, a log perhaps? More information would be helpful.

Sorry I wasnt very specific, I meant the index field in the response JSON for each embedding object in the “data” field

{
  "data": [
    {
      "embedding": [
        -0.006929283495992422,
        -0.005336422007530928,
        ...
        -4.547132266452536e-05,
        -0.024047505110502243
      ],
      "index": 0,
      "object": "embedding"
    }
  ],
  "model": "text-embedding-ada-002",
  "object": "list",
  "usage": {
    "prompt_tokens": 5,
    "total_tokens": 5
  }
}

Ahh, I see.

This is the response from an ada-002 embedding call, the index field represents the position of the text sent to the model, so if you had of sent several text sections for embedding then you would get indexes of 0, 1, 2, 3, etc, etc. these directly relate to the order in which the text was put into the API call. So just a 1:1 positional, zero based index of the message string position in the API call.

Ah Thank you, Do they come back in the order 0,1,2,3 or can they come back muddled up sometimes ? as in I don’t really need to look at the index I can just parse it as an array and assume its in the correct order ?

Always in a direct 1:1 positional relationship, never muddled.

Amazing, thanks so much for the help :grinning:

1 Like