[Embeddings.create] Improve InvalidRequestError message: "['', 'a'] is not valid under any of the given schemas - 'input'" for large arrays

Hi,
The error message returns from embedding.create does not point to the exact element in the input array that us not valid. This is really annoying when you pass large array and need to debug what element is not valid.

My suggestion here is to point to the specific element in the input array that is not valid instead of returning the whole array.

E.g.
Running this code in python:

from openai.utils import get_embeddings
get_embeddings(['','a'])

will return the following error message
ERROR: openai.error.InvalidRequestError: ['', 'a'] is not valid under any of the given schemas - 'input'

A better error message in this case will be

`ERROR: openai.error.InvalidRequestError: [''] is not valid under any of the given schemas - 'input'`

Thanks,
Roy.

2 Likes

Yes, the API is officially a beta and your feedback is very important. Thank you @roy-pstr

Regarding your embedding prompt, I tried two versions and both worked for me. Here is an incomplete snapshot (not showing the entire 1024 long vector, in the interest of saving space, haha ):

Prompt: ‘’,‘a’

Prompt: [‘’,‘a’]

HTH

from openai.embeddings_utils import get_embeddings
get_embeddings(['','a'])

raise the following error:

openai.error.InvalidRequestError: ['', 'a'] is not valid under any of the given schemas - 'input'

version:
openai 0.26.4

It worths mention that the default engine is used here: “text-similarity-davinci-001”

And make sure the input you are using is list of prompts. not a single prompt which is a string of list.

Anyway. I’m 100% sure that this raising an error, and my suggestion is to point to the element in the array that cause the error.

Best,
Roy.

2 Likes

i encounter same error

1 Like

I had the same error and couldn’t figure out what’s wrong as I have big volume of data. Strangely, when I worked without batching (passing only a list of length 1 each time) it worked :man_shrugging:

1 Like

me too facing this error

error_code=None error_message="[] is not valid under any of the given schemas - 'input'" error_param=None error_type=invalid_request_error message='OpenAI API error received' stream_error=False

error_trace

Traceback (most recent call last):\n  File \"/usr/local/lib/python3.8/site-packages/tenacity/__init__.py\", line 382, in __call__\n    result = fn(*args, **kwargs)\n  File \"/usr/local/lib/python3.8/site-packages/llama_index/embeddings/openai.py\", line 149, in get_embeddings\n    data = openai.Embedding.create(input=list_of_text, model=engine, **kwargs).data\n  File \"/usr/local/lib/python3.8/site-packages/openai/api_resources/embedding.py\", line 33, in create\n    response = super().create(*args, **kwargs)\n  File \"/usr/local/lib/python3.8/site-packages/openai/api_resources/abstract/engine_api_resource.py\", line 153, in create\n    response, _, api_key = requestor.request(\n  File \"/usr/local/lib/python3.8/site-packages/openai/api_requestor.py\", line 226, in request\n    resp, got_stream = self._interpret_response(result, stream)\n  File \"/usr/local/lib/python3.8/site-packages/openai/api_requestor.py\", line 619, in _interpret_response\n    self._interpret_response_line(\n  File \"/usr/local/lib/python3.8/site-packages/openai/api_requestor.py\", line 682, in _interpret_response_line\n    raise self.handle_error_response(\nopenai.error.InvalidRequestError: [] is not valid under any of the given schemas - 'input'\n\nThe above exception was the direct cause of the following exception:\n\nTraceback (most recent call last):\n  File \"/app/src/chatbot/query_gpt.py\", line 249, in get_answer\n    context_answer = self.call_pinecone_index(request)\n  File \"/app/src/chatbot/query_gpt.py\", line 217, in call_pinecone_index\n    index_response = custom_index.query(final_query)\n  File \"/usr/local/lib/python3.8/site-packages/llama_index/indices/query/base.py\", line 20, in query\n    return self._query(str_or_query_bundle)\n  File \"/usr/local/lib/python3.8/site-packages/llama_index/query_engine/retriever_query_engine.py\", line 145, in _query\n    response = self._response_synthesizer.synthesize(\n  File \"/usr/local/lib/python3.8/site-packages/llama_index/indices/query/response_synthesis.py\", line 158, in synthesize\n    text = self._optimizer.optimize(query_bundle, text)\n  File \"/usr/local/lib/python3.8/site-packages/llama_index/optimization/optimizer.py\", line 78, in optimize\n    text_embeddings = self.embed_model._get_text_embeddings(split_text)\n  File \"/usr/local/lib/python3.8/site-packages/llama_index/embeddings/openai.py\", line 253, in _get_text_embeddings\n    return get_embeddings(\n  File \"/usr/local/lib/python3.8/site-packages/tenacity/__init__.py\", line 289, in wrapped_f\n    return self(f, *args, **kw)\n  File \"/usr/local/lib/python3.8/site-packages/tenacity/__init__.py\", line 379, in __call__\n    do = self.iter(retry_state=retry_state)\n  File \"/usr/local/lib/python3.8/site-packages/tenacity/__init__.py\", line 326, in iter\n    raise retry_exc from fut.exception()\ntenacity.RetryError: RetryError[<Future at 0x7fd6fc0a5c10 state=finished raised InvalidRequestError>]\n

any solution

Simply don’t enter any empty string. Then it will work inshaAllah.

“Simply don’t write code with bugs then you won’t get errors”…

It’s obvious that there is a bug but I tell about how to get around of it and it’s useful to say because I faced with this error when it’s not that possible to realize it’s because of empty strings. So some people may find it useful.

Have a good day