Need help? OpenAI Japanese Language support


I am using OpenAI’s embedding-based model to train on Japanese context and obtain results based on context-related questions only. However, after the chat completion process, I often receive answers that are either in English or not relevant to the Japanese context. Is there a Japanese-specific model available for fine-tuning with OpenAI, or are there any other methods to overcome this issue?

Thank you."

While this is not a great answer it at least it gives you one possible direction to take if you have not tried this.

Prompt in Japanese. 日本語でのみ返信してください。現在のユーザーの質問に答えるために、上記に挿入された新しい知識を使用してください。

Inject the knowledge augmentation along with a prefix like “knowledge database retrieval for user’s query: (RAG data)”

You could fine-tune an AI with a new identity, and show it responding to role messages in Japanese with responses that use that information, but that tedious preparation and higher cost should be unnecessary.


I’ve already tried this with a Japanese prompt, but I’m still facing the same issue. I am using the embedding model Davinci 002. Sometimes, the response contains the question.

davinci-002 is not an “embedding model”, it is a base completion model that is not trained on instruction-following.

text-davinci-002, if that’s what you’re actually using to not get nonsense output, is an older instruct model, and it doesn’t have as much training on varied inputs either.

Neither can be used by a chat completion endpoint. A chat model would only accept a messages format.

messages = [{“role”: “system”, “content”:“You are ChatJPN, an AI language assistant that likes to speak only Japanese, released 2023. AI knowledge: only before 2022.”},
{“role”: “user”, “content”:“Introduce yourself.”}]

You should start by changing the AI model to gpt-3.5-turbo-instruct if using “completions” for some particular reason. The prompt style will be different than you “chat with an AI” about. Then look at the API Reference documentation for adapting to chat completions and use gpt-3.5-turbo or gpt-4.

we are using GPT-3.5 for embedding. When we provide complex content with less domain specificity, the model often responds with what we instruct OpenAI to reply if it doesn’t find an answer, such as “Sorry, I don’t know the answer.” We tried changing the prompt as you suggested, but most of the time, the GPT model 3.5 doesn’t provide any response. After the chat completion using text-davinci 002, it sometimes mixes responses in English. Even when we explicitly state in the prompt, ‘Please reply only in Japanese,’ it gives error responses in Japanese. For Japanese context, it often provides no results.”

embeddings” is not a model that returns language. It returns a vector of values that can be used for semantic search using a vector database.

An OpenAI model for embeddings is "text-embedding-ada-002"

Perhaps you are describing ‘text injection’ for knowledge augmentation to inform the AI?

text-davinci-002 is not a “chat completion” model. It is an older model that is superceded by at least three that are better quality. The only way you would stumble upon that these days is if you asked ChatGPT to write code using its obsolete knowledge.

gpt-3.5 itself is not a full name of a model. gpt-3.5-turbo is a chat completion model that must actually be used through the chat completion endpoint. It is the same as used in free ChatGPT. As I stated before, using a model recommended for understanding Japanese would be the first start, and then moving to gpt-4 if your task still can’t be understood.