Closing bracket "]" missing from embeddings that are getting generated using openAI

While generating embedding using openAI model embedding model (text-embedding-ada-002) we are not getting “]” closing bracket at end of each array for each row parsed with sample text . We can only see open brackets and closing are even not getting added manually .

We tried for first 3 rows containing text still we are getting same issue . Sample EMBEDDING output for one row w/o “]” bracket looks below ,

[-0.017111334949731827, -0.01736113429069519, -0.00905526801943779, -0.006987475324422121, -0.005818270146846771, 0.011449189856648445, … , -0.039857059717178345, -0.04032890498638153, 0.0054435692727565765, 0.00532907759770751, 0.014349651522934437, -0.02120528742671013, 0.016334177926182747, 0.011379800736904144

Here see above “]” closing bracket is missing ,

Please help as we are unable to find the RCA for the same and blocked as w/o [proper embedding array we cannot apply ML algorithms

Thanks
Dhruv Shah

1 Like

Hi Dhruv. Can you share the code snippet/API call that you are using to generate the embeddings ?
That will be really helpful to see what the problem might be

1 Like

Platform:

Python 3.8.16
openai 0.28.0
numpy 1.24.4

Script:

import openai
openai.api_key = "sk-1234"
print(openai.Embedding.create(
  model="text-embedding-ada-002",
  input="banana banana I eat bananas"
))

Output:

{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "index": 0,
      "embedding": [
        -0.021899977698922157,
        -0.02625957690179348,
        0.02568594552576542,
        0.015513545833528042,
        0.002731123473495245,
        0.003432228695601225,
...
...
        -0.03316865116357803,
        -0.0028107943944633007,
        -0.009783604182302952,
        -0.005921151954680681
      ]
    }
  ],
  "model": "text-embedding-ada-002-v2",
  "usage": {
    "prompt_tokens": 5,
    "total_tokens": 5
  }
}

So see if you can’t replicate this, and then find where your application goes wrong.

1 Like

import os

import openai
os.environ[‘OPENAI_API_KEY’] = ‘xyz’
openai.organization = “xyz”

openai.api_key = os.getenv(“OPENAI_API_KEY”)

openai.Model.list()

1 Like

embedding_model = “text-embedding-ada-002”

embedding_encoding = “cl100k_base”

max_tokens = 8000

1 Like

Hi,

Below is the code called for open AI to fetch embedding for tokens passed as raw text,

Blockquote
top_n = 10
df = df.sort_values(“sr.no”).tail(top_n * 2) # first cut to first 2k entries, assuming less than half will be filtered out
df.drop(“sr.no”, axis=1, inplace=True)
encoding = tiktoken.get_encoding(embedding_encoding)
df[“n_tokens”] = df.combined.apply(lambda x: len(encoding.encode(x)))
df = df[df.n_tokens <= max_tokens].tail(top_n)
len(df)

Blockquote
df[“embedding”] = df.combined.apply(lambda x: get_embedding(x, engine=embedding_model))

Above ode generate embedding which when saved to Excel shows “]” missing in each array of embedding for corresponding row with text tokens

1 Like

Hi @udm17 ,
Any idea you have for such behavior ?

Thanks
Dhruv Shah

1 Like

I think whatever issue you’re having is between your code and Excel.

The API is returning valid JSON.

1 Like

Plus the pasted code from the OpenAI cookbook that is in the same basic form as it was in early 2022.

1 Like

I have resolved this error with the code below.

df[‘embedding’] = df[‘embedding’].astype(str)

1 Like