I am using the Assistants API to summarize documents. Many of my users upload documents in non-English languages such as Korean, Spanish, German, etc.
I am struggling to write instructions for the Assistant and a message for the thread that consistently summarizes the documents in the same language in which it was written. Often, the summaries end up defaulting to English when the documents are not written in English. Today, a completely English language document was summarized in German, presumably because the word “German” appears in the text of the document.
This is what the code looks like when I create the Assistant:
assistant = client.beta.assistants.create(
You are an AI summary Assistant. Go step by step because the result of the first step serves as input to the next steps.
1. Detect the primary spoken language used in the documents.
2. Take extreme care to write a summary of the document in the detected spoken language, also answering any questions asked by the user.
Please maintain the context and key points from each document while keeping the summaries concise yet detailed. Take your time and provide a direct summary by pulling out the most important key points, facts and opinions. Use 2000 words or less.""",
And here’s what my thread creation looks like:
thread = client.beta.threads.create()
content = "You have the files in the Assistant. Summarize this document in the language in which it was written. Use 2000 words or less."
run = submit_message(assistant.id, thread, content)
I have tried innumerable variations of the instructions and prompts. I’ve tried only providing instructions to the assistant and omitting instructions in the thread. I have tried giving the Assistant pseudo-code to follow. I have tried splitting the messages in the thread into two parts: one that asks for language identification and then another that says “use that language” for the summary.
Nothing I have tried has worked consistently.
Any suggestions and/or commiserations would be greatly appreciated.