Confused about how to use file_search effectively with my Assistant

rafiul.nakib · September 21, 2024, 12:08am

Hi Guys, I am a newbie here and trying to get my head around all these things about Assistants and file_search tool. So please pardon me if I ask something very basic!

So, I am trying to build an Assistant that would act as a tutor on specific subject, say, physics. It will use a specific physics book to answer a user’s question. Now, what I have done is create a vector store and uploaded a pdf copy of the book into it, then instructed the Assistant to use it whenever answering a question asked to it.

Problem is, even for a simple question for which the answer is barely 50 words, it uses up thousands of tokens, which is clearly not feasible. To make it cost efficient I am thinking to implement the following,

Segment the book by it’s chapters and have seperate vector stores for each chapter.
Each vector store will have multiple .json files for each section of the chapter.
Each .json file will have the following structure (content in Bengali), as an example,

{
    "id": "4.0",
    "content": "কঠিন, তরল ও বায়বীয় পদার্থের এই তিন অবস্থায় থাদের ধর্ম ও বৈশিষ্ট ভিন্ন হয়ে থাকে, ফলে তাদের ব্যবহারও তিন অবস্থায় ভিন্ন হয়। এই অধ্যায়ে পদার্থের বিভিন্ন অবস্থার প্রেক্ষিতে তাদের ধর্ম ও বৈশিষ্ট এবং দৈনন্দিন জীবনের সঙ্গে সংশ্লিষ্ট বিভিন্ন বিষয় আলোচনা করা হয়েছে। বিষয়গুলো হলো:\n১: পদার্থের তিনটি অবস্থা।\n২: কণার গতিতত্ব।\n৩: ব্যাপন, নিঃসরণ।\n৪: মোমবাতির জ্বলন এবং মোমের তিনটি অবস্থা।\n৫: গলন ও স্ফুটন, পাতন ও উর্ধ্বপাতন।",
    "metadata": {
        "title": "সূচনা",
        "tags": ["মূল বিষয়", "মূল বিষয়সমূহ", "মূল বিষয় সমূহ", "বিষয়", "topic", "main topic", "main বিষয়"]
    }
}

What I want the assistant to do is following,

Whenever the user asks a question, if the question consists of certain keyword, it will query only those files in the vector stores where the same keyword has been used as a tag. Then answer the question using only the contents of those files.

How can I achieve this? Is simple prompt engineering going to be enough? If so, can I specify the rules to use certain files for certain keywords by mentioning the file_id? Or do I need to do function calling?

Any suggestions/guidelines will be much appreciated. If you have done something similar and don’t mind sharing that would be amazing! Thanks for reading!

MrFriday · September 21, 2024, 10:36am

I’d do function Calling. You can define separate functions for different chapters and in the description of these Function, include the keywords you want to map for the chapters.

Let me know if you have any specific issue or query regarding Function Calling, or you can follow a video I created for Function Calling in OpenAI Assistant.

rafiul.nakib · September 25, 2024, 3:37am

Is it possible for you to show me an example of how I may implement function calling in this case? In my understanding, this function will not need to execute anything external as all it is supposed to do is take the keywords (extracted from the user input), use those as arguments and finally output the exact files stored in the vector store for the Assistant use to answer the question. Correct me if I am wrong. Thanks in advance.

MrFriday · September 25, 2024, 6:27pm

Now after I’ve said it, it seems little difficult. But try this:

Create 1 Assistant, but don’t attach any vector stores to it.
Use Chat Completion for you initial input. When User in the start, gives information like what topics he wants the answers to, use function calling to produce one parameters out of those vector stores using enum. Lets say you have 3 PDF Vector Stores of 3 different topics. Physics, Chemistry and Math. Now let function calling select out from these.
On basis of initial message from user, you will receive one of these options. Lets say user asked ‘Thermodynamics’, assistant will know that out of available ENUM options, he only have Physics that matches the description, so function call will output a json with Thermodynamics variable.
While retrieving RUN, fetch this variable from require action and in your code, map the Physics with vector store ID.
Create a thread and pass this VS Id to it and run it via assistant.

And Bingo!! I know this sounds messy but give me one day and I can create a blog for the full implementation(message me if I don’t). Meanwhile if you are confused in Function Calling, check the above video I shared.

Made a detailed video for this implementation: Confused about how to use file_search effectively with my Assistant - #7 by MrFriday

MrFriday · September 26, 2024, 5:57pm

I’ve implemented this today in Apex Language. It is possible and I think cool way to keep your Assistant Lightweight.

rafiul.nakib · September 26, 2024, 8:12pm

You are a legend! Any chance you can share what you have done so that I can study and try to understand?

MrFriday · September 28, 2024, 1:56pm

So sorry for this late reply. I created a video for Dynamic Binding Vector Store in OpenAI Assistant. Let me know if it helps you.

rafiul.nakib · September 29, 2024, 9:15am

I get the idea, thanks for the effort mate! I will try and implement the same concept in python and see if it works the same.

patienceigir · September 30, 2024, 12:36pm

Hello!
I’m a very newbie, and I want to get specific information in the file I uploaded, I was trying this, but it didn’t work, saying that they can’t directly access the file I provided. Can you assist me please, on how I can send a file to OPENAI and ask a question about it?

Here are my Python codes

with open("image.png", "rb") as file:
    response = openai.files.create(
        file=file,
        purpose="vision"
    )

response = openai.chat.completions.create(
    model="gpt-4",
    messages=[
        {
            "role": "user",
            "content": "Based on the content of the uploaded image file of the cover of the book, can you tell me who is the author? "
        }
    ],
)

MrFriday · September 30, 2024, 3:30pm

you are using ChatCompletion API, if you want to retrieve infomation from file you have to either provide all file content in your message, or use OpenAI Assistants.

Topic		Replies	Views
How can I make the assistant 'read' scanned documents that are in PDF format? API assistants-api , file-uploads	3	224	June 2, 2025
Creating an AI Assistant with OpenAI API: How to Upload Files for Knowledge Base? API gpt-4 , chatgpt , assistants-api	5	9158	June 6, 2024
Missing required parameter: 'file_id' API	5	875	July 4, 2024
File search + function calling on Assistants API function-calling , tools	13	5764	August 6, 2024
Send file as attachment in the prompt and ask questions about it instantly API chat-completion , file-uploads	7	45995	December 17, 2024

Confused about how to use file_search effectively with my Assistant

Related topics