API embedding support for Czech and Slovak languages

I am trying to build an application, which will:

  • store in pgvector db embeddings of created prompts
  • use Open AI embedding API to create embedding of the query in natural language
  • application will search for most relevant prompt against the query
  • currently I am using model: text-embedding-ada-002

I new to OpenAI API and I have encountered following problems, questions - can you help me?

  1. The user will ask the queries in English, Czech or Slovak languages
  • I have found the posts, saying that Open AI embeddings were created using languages other that English - is it possible to find, if Czech and Slovak was included?
  • Is there some option to detect probable language of the text (whether it is English or Czech…)?
  • From previous posts I understand, that I should create “separate” embeddings and queries in particular languages - is my understanding of approach correct?
  1. OpenAI API was refusing the text with UTF-8 Czech characters - HTTP??? Error
  • For instance text: č Č ě is encoded as: c4 8d 20 c4 8c 20 c4 9b 20 c4 9a
  • which encoding is Open AI API supporting? Can you provide a code of above mentioned text?