Need help? OpenAI Japanese Language support

ajay.kumar · October 11, 2023, 12:17pm

"Hello,

I am using OpenAI’s embedding-based model to train on Japanese context and obtain results based on context-related questions only. However, after the chat completion process, I often receive answers that are either in English or not relevant to the Japanese context. Is there a Japanese-specific model available for fine-tuning with OpenAI, or are there any other methods to overcome this issue?

Thank you."

EricGT · October 11, 2023, 12:21pm

While this is not a great answer it at least it gives you one possible direction to take if you have not tried this.

_j · October 11, 2023, 5:07pm

Prompt in Japanese. 日本語でのみ返信してください。現在のユーザーの質問に答えるために、上記に挿入された新しい知識を使用してください。

Inject the knowledge augmentation along with a prefix like “knowledge database retrieval for user’s query: (RAG data)”

You could fine-tune an AI with a new identity, and show it responding to role messages in Japanese with responses that use that information, but that tedious preparation and higher cost should be unnecessary.

ajay.kumar · October 12, 2023, 10:42am

Hi,

I’ve already tried this with a Japanese prompt, but I’m still facing the same issue. I am using the embedding model Davinci 002. Sometimes, the response contains the question.

_j · October 12, 2023, 4:05pm

davinci-002 is not an “embedding model”, it is a base completion model that is not trained on instruction-following.

text-davinci-002, if that’s what you’re actually using to not get nonsense output, is an older instruct model, and it doesn’t have as much training on varied inputs either.

Neither can be used by a chat completion endpoint. A chat model would only accept a messages format.

messages = [{“role”: “system”, “content”:“You are ChatJPN, an AI language assistant that likes to speak only Japanese, released 2023. AI knowledge: only before 2022.”},
{“role”: “user”, “content”:“Introduce yourself.”}]

You should start by changing the AI model to gpt-3.5-turbo-instruct if using “completions” for some particular reason. The prompt style will be different than you “chat with an AI” about. Then look at the API Reference documentation for adapting to chat completions and use gpt-3.5-turbo or gpt-4.

ajay.kumar · October 13, 2023, 4:31am

“Hi,
we are using GPT-3.5 for embedding. When we provide complex content with less domain specificity, the model often responds with what we instruct OpenAI to reply if it doesn’t find an answer, such as “Sorry, I don’t know the answer.” We tried changing the prompt as you suggested, but most of the time, the GPT model 3.5 doesn’t provide any response. After the chat completion using text-davinci 002, it sometimes mixes responses in English. Even when we explicitly state in the prompt, ‘Please reply only in Japanese,’ it gives error responses in Japanese. For Japanese context, it often provides no results.”

_j · October 13, 2023, 6:15am

“embeddings” is not a model that returns language. It returns a vector of values that can be used for semantic search using a vector database.

An OpenAI model for embeddings is "text-embedding-ada-002"

Perhaps you are describing ‘text injection’ for knowledge augmentation to inform the AI?

text-davinci-002 is not a “chat completion” model. It is an older model that is superceded by at least three that are better quality. The only way you would stumble upon that these days is if you asked ChatGPT to write code using its obsolete knowledge.

gpt-3.5 itself is not a full name of a model. gpt-3.5-turbo is a chat completion model that must actually be used through the chat completion endpoint. It is the same as used in free ChatGPT. As I stated before, using a model recommended for understanding Japanese would be the first start, and then moving to gpt-4 if your task still can’t be understood.

Topic		Replies	Views
ChatGpt-3 Provides different output that are not related to queries API plugin-development , api , chatgpt-plugin	13	1740	December 18, 2023
How can we make the answer concise with fine tuning? API fine-tuning , api	8	2945	June 7, 2023
OpenAI Embedding model Answering issue API gpt-4 , gpt-35-turbo , chatgpt , chat-completion , openai	4	1101	December 19, 2023
The completion does not always follow the rule set in prompt when it is required to translate an article API	5	1421	March 18, 2023
When calling the API, the model passed in is gpt-4-0314. When asked what language model it is, it still says that it is gpt-3. Why? API gpt-4	16	2440	June 13, 2023

Need help? OpenAI Japanese Language support

Related topics