[Invalid] Error embedding certain Unicode in square brackets (ERR_UNESCAPED_CHARACTERS)

Hey, I’ve hit a problem I can’t solve with createEmbedding
Specific characters encased in square brackets cause an ERR_UNESCAPED_CHARACTERS error

Nodejs: openai 3.2.1

Error seems to originate within OpenAI as an Axios redirect error

Expand for error
TypeError [ERR_UNESCAPED_CHARACTERS]: Request path contains unescaped characters
    at new NodeError (node:internal/errors:371:5)
    at new ClientRequest (node:_http_client:154:13)
    at Object.request (node:http:96:10)
    at RedirectableRequest._performRequest (openai-project/node_modules/follow-redirects/index.js:284:24)
    at new RedirectableRequest (openai-project/node_modules/follow-redirects/index.js:66:8)
    at Object.request (openai-project/node_modules/follow-redirects/index.js:523:14)
    at dispatchHttpRequest (openai-project/node_modules/axios/lib/adapters/http.js:242:25)
    at new Promise (<anonymous>)
    at httpAdapter (openai-project/node_modules/axios/lib/adapters/http.js:48:10)
    at dispatchRequest (openai-project/node_modules/axios/lib/core/dispatchRequest.js:58:10) {
  code: 'ERR_UNESCAPED_CHARACTERS'

This code reproduces the error every time

await this.api.createEmbedding(
    {
        model: 'text-embedding-ada-002',
        input: '[☁️]', // U+2601
    },
);

I’ve found these trigger the response:

  • U+2601 ‘:cloud:
  • U+1F4AB ‘:dizzy:
  • U+0131 ‘ı’

A cookie emoji works fine U+1F36A ‘:cookie:’ and I’ve not need able to ascertain what the difference is as it matches all the same specs as ‘:dizzy:’ as far as I’m aware

I thought this would be an interesting way to break ChatGPT conversation history if it is the embeddings engine, but it seems they can be recalled fine.

Since I seemingly don’t have close to the environment you are using set up, I’ll just give the most useful part of bot talk:

To troubleshoot the issue, you can check the Axios request configuration and the specific URL being used in the request. Make sure that any special characters, including emojis, are properly encoded using encodeURIComponent() or a similar function. Additionally, verify that the URL is correctly formatted and complies with the URL specification.

If you have access to the dispatchRequest.js file, you can look for the line number mentioned in the error (line 58) to see the exact code that’s causing the issue. You may also examine the surrounding code to get more context on how the URL is being handled.

Maybe log the variable right before the line.

2 Likes

I can get ada-002 to embed the :cloud:. If I paste in the cloud symbol directly, I get a weird symbol in VSCode, but it will still embed.

But I think the proper way is to call out the symbol directly, for example:

If I use u"\u2601" in python, I get this embedding with these first 10 values:

print(Embedding[0:10])
[-0.0059571844, -0.0029479726, -0.009403573, -0.028387632, -0.014289077, 0.014479598, -0.012819343, -0.031109361, -0.0017776293, -0.027870504]

This is the only way I think you can be sure you are getting the right embedding.

Also, not sure why you have extra square brackets around :cloud:, but that shouldn’t matter, maybe try without the brackets?

1 Like

Logging throughout all the error lines had me reach my root cause

The problem was completely unrelated to OpenAI!
A request being fired off in the background simultaneously, using the embedding input so it just appeared to be OpenAI related

Thanks for the input, sorry about the waste of time :sweat_smile: