How to force assistant to use file information?

For example if I have a file which is a json array of character attributes. For example:
“shop_keep_1”: {
“personality”:“Angry and short tempered”,
“example”:“What do you want? I don’t have time for you to window shop!”
},
“inn_keeper_2”:{
“personality”:“Friendly and overly helpful.”,
“example”:“How can I help you, sir? Please take your time, no pressure.”
}

If I am using a prompt such as:
User will input a name, return a greeting with a similar tone as the “personality” for the key that matches the user input.

But it seems like half the time it completely ignores the file attached. Any way I can force it to pull only from the key and only the information from the file?

For example, half the time I input “shop_keep_1” it does alright. But the other half of the time I would get responses like:
“How are you doing friend?”

5 Likes

To ensure that the assistant consistently uses the information from the file, especially in the context of responding based on character attributes defined in a JSON array, you can take the following steps:

  1. Clear Instruction: When you ask a question or issue a prompt, explicitly state that the response should be based on the information provided in the file. For example, “Using the character attributes from the file, respond to ‘shop_keep_1’ with a greeting that matches their personality.”

  2. Reference the File Directly: Mention the file directly in your prompt. For example, “Refer to the character attributes in the uploaded file and provide a greeting for ‘shop_keep_1’ that matches their personality.”

  3. Follow-Up for Accuracy: If the response does not align with the information from the file, you can follow up by pointing out the discrepancy and asking for a revised response that strictly adheres to the file’s content.

  4. Specificity in Prompts: Be as specific as possible in your prompts. If you notice inconsistencies, you can include a part of the character attribute in your question to guide the assistant. For example, “Given that ‘shop_keep_1’ in the file is described as ‘Angry and short-tempered,’ how would they greet a customer?”

  5. Use of Direct Quotes: You could ask the assistant to use or reference the direct quotes from the file. For instance, “What greeting would ‘shop_keep_1’, who says things like ‘What do you want? I don’t have time for you to window shop!’, use to greet a new customer?”

By clearly directing the assistant to use the file and specifying how it should use the information, you increase the likelihood of getting responses that are consistent with the content of the file. Remember, the assistant will try to balance the use of information from the file with its general knowledge and conversational abilities, so being explicit about your expectations is key. Meow~

cr. my CatGPT

8 Likes

No matter how specific you will be about it in instructions - it won’t work consistently.

Been trying the last 4 days to tweak instructions over and over with input from this forum. To no avail, just frustration.

Use cases like this used to work great, but over the last couple of weeks something happened and it doesn’t anymore.

You can find dozens of posts in the forum complaining about the same. People suggest being clearer in instructions. But it’s not something that can be fixed tweaking instructions, it’s something that can be fixed by OpenAI.

My opinion.

7 Likes

Thanks for the feedback. I tried some of the techniques and it seemed to work better, but it still wasn’t consistent. I tried doing a set of instructions (do this, then this, then this), explicitly stating to always using the file, to no avail.
It seems like this is not something that can be solved via instructions. :confused:

2 Likes

Yes, tried exactly the same.

Read a post, get inspired about a potential better way to write instructions, try it out and crash & burn.

4-5 different methods with step by step etc etc. Nothing works consistently.

Works great 50% of the time, the other 50% doesn’t work.

I’m beyond frustrated tbh as I have to deliver a few (smaller thank god) customer projects this week. So looking at alternatives.

1 Like

I am also facing same thing. The inconsistency in the responses is a lot.

1 Like

I faced the same file-based knowledge prioritization issue with a json schema that i want to have priority over the pre-trained knowledge. I gave it a name and referred it in the instructions. Tried dozens of sentences in mandate GPT-4 turbo to take this json schema. But the Assistant took it or not randomly.
IMO, GPT should offer a way to enforce the file-based knowledge (with some magic sentence « always use the json schema named xx from the file»?). Obviously, asking ChatGPT 4, it should work. But it doesn’t!

1 Like

Has this been addressed yet?
I am testing using an assistant that has a USA Mississippi CDL doc file ( basically a driving test manual for MS only). I need to force it to only reference this file as the regulations change form state to state.
Example there is a sections stating ‘Any male who is at least eighteen (18) years of age’ etc.
When I ask ‘what age I need to be to take the test’ it gives a generic reply about checking online.

it ‘seems’ to be using it’s own knowledge first.

Unfortunately, I’m facing the same case and the command isn’t effective. Please share if you find a solution.

Where can I trace whether the assistant has used the document or existing knowledge ?

1 Like

I am facing a similar problem. Its also strange in the calculation of the costs of the different apis.
In my last test the answer was: “I am looking into the File XY to …now i am writing the content based on the file: Content…”

So it seems like the assistant clearly used the file. But when i check my cost dashboard. Only costs for using gpt-4 turbo went up. Like the Assistant was created but wasn’t used to create the content. Ony gpt-4 turbo was used.

How can that even be?

I don’t think that is possible. At least I haven’t found a solution in the documentations.

1 Like

I face similar issues. I’m trying to make it memorize a book and test me.

1 Like

same issue here - no good practices found to instruct assistant when use retrieval/file search and when not to;

I’m ready to throw in the towel with assistants. It’s so inconsistent that I can’t use it on production.

I just implemented my assistant and I can confirm that the behavior is randomly ineffective. Sometimes it answers correctly based on the attached documents and given instructions, other times is like if I didn’t provide any guide to it.

For example, I have instructed my assistant to help with any questions about a music song. And yet, even a simple question like “What can you tell me about this song?” sometimes returns answers like “I’d be happy to help you. Please provide me with any lyrics or descriptions you can recall from the song so I can assist you better.” which is insane after I have provided the assistant not only with the actual sheet music of the song but with a text file with inside all needed meta information filled with instructions on how to answer. And of course, on the top of the prompt which explicitly tells this (for example about The Beatles’ Yesterday song):

You are a music expert, particularly an expert in pop music as well as piano performance and teaching. You have deep knowledge about The Beatles and their "Yesterday" song. Make sure to answer questions based on the attached files first, avoiding mentioning them. If you can't find an answer, use your best knowledge outside the attached documents and refer to the original if you can't find information about this particular version. All questions without a defined subject are about "Yesterday" by The Beatles. Any reference to "song" is referring to the attached document. NEVER suggest uploading documents. When the user writes "this" without a subject, it is referring to the attached document.

And yet, 30%-40% of the time, I get answers like:

“I don’t know what song you are referring to”

or

“If you could upload a file of the song I can answer any questions you may have”

etc…

Frustrating and mostly useless!


UPDATE:

Ok, after some research, it turns out that the official OpenAI basic tutorial for assistants is quite confusing because it doesn’t
explicitly tell that the “instructions” option in the run overrides the assistant’s “instructions” option. I was defining different instructions for the run, thinking that those instructions were going to be considered “additional” to the instructions given to the assistant, hence the problem.

So… I solved the problem by replacing the “instructions” option in the run with the “additional_instructions” option instead. That doesn’t override the instructions given to the assistant and is “added” to the overall prompt to the assistant when the run is launched.

This thread helped me on this issue:

@parkforce to determine the source of knowledge, you can ask the CustomGPT (or assistant) to display what it thinks. I’ve found it to be accurate.

PROMPT:
Generate a list of 20 questions I can ask you on (topic).

Present the results as a table with the columns:
User question, Your Answer, Source (Source is “GPT” for general knowledge or “Files” if using the files uploaded)

1 Like

@treboralmasy1 try this, it worked for me. I was just using text files, not JSON format, but I imagine it will work for you.

PROMPT:
Only answer using knowledge from the files provided. Do not use general GPT knowledge.

PROMPT:
Only answer using knowledge from the files provided:
Filename1.txt
Filename2.txt
Do not use general GPT knowledge.

4 Likes

I have achieved very good results using GPT3.5-Turbo-0125. First, I make a schematic separation of the most important topics (I ask GPT4.0 to do this). By hierarchizing the most important topics, I then proceed to summarize them (summarized by GPT4.0). After summarizing them, I assign each to a document. What we observed is that it is very important to properly establish the name of the file. For example, if the topic of the file is traffic rules, I do not name the file “traffic rules”; instead, I specify elements from the file itself, such as “stop sign,” “traffic light,” etc. We observed that by incorporating key elements of the file into the name, the model will use only the context of that particular file. Additionally, if the file is summarized, leaving only the “important” content that we truly need, then the responses are much more precise and solid, achieving very good results. Something very important is to instruct the model on what to do with these files, something like this:

“You are an expert teacher in answering questions. You will be provided with files from which you should draw your answers. The answers must be without abbreviations, references, and notes. All the answers you give should be explanatory and include examples.”

Main file: human resources.pdf. This document is schematized, hierarchized, and fragmented into:
1-Performance management, Robbins and Coulter, Idalberto Chiavenato, performance evaluation, evaluation method.txt
2-Human resources management.txt
3-Employee orientation, types of training, training methods.txt
4-Human resources planning process, Robbins and Coulter, job specification, recruitment.txt
5-Human resources planning process, Robbins and Coulter, job specification.txt
6-Roles of human resources.txt
7-Taylor, Fayol, Mayo, Drucker, history of HR.txt
8-Location, contact information, and schedule.txt
9-Enhancing and projecting the competitive organization of the future, fundamental challenges facing executives.txt

I hope this is useful to you!

1 Like