Improving Arabic writing of gpt-4 in chat completions using training or embeddings

I’ve used gpt-4 to build a small tool for students to practice conversation in Arabic combining google’s text to speech and speech recognition with gpt-4 chat completions, the challenge I’m facing that gpt-4 responses are not using diacritics properly in Arabic (which is understandable since the majority of the text on the web doesn’t have diacritics), being a publishing house we have large amounts of data of word documents containing the same text with and without diacritics, and I wanted your help to understand the best way to improve gpt-4 models answers using diacritics would embeddings help solve this problem and whats the best way to get started with it. and if not can a trained ada model be used for chat completions?

3 Likes

I have a very similar problem. Did you ever get any helpful responses or find ways to improve the responses with regards to diacritics?

2 Likes

I am having the same issue with my work.

Please kindly let us know if you have been able to resolve this issue.

However, if I find a solution, I will post it here.

One technique you can use with AI models is by giving multi-shot examples of conversation style before the actual user input.

You may be able to shape the proper output by putting five or ten similar writing examples where typical user inputs are responded by the AI assistant with the exact text encoding and formatting that it should produce.

1 Like