I’m building a RAG Assistant, and notice that my token costs are climbing very quickly. A modest tool definition JSON file with about 300 lines of code alone adds approximately 1300 tokens per query, regardless of whether a tool is called or not. If multiple tools are called, those additional tokens are multiplied by the number of tools called.
Surely we are not expected to pay for tens or even hundreds of thousands of tokens for each and every query in a large app.
Some questions.
- Does OpenAI recommend a different workflow than having all of your functions attached to one assistant?
- If a “root” assistant is used to judge context and then pass the query on to a more specialized assistant that owns a set of functions, can the same thread be maintained? The documentation is ambiguous as to whether threads are meant to be handled by more than one assistant.
- Does OpenAI plan to introduce more efficient prompting behind the scenes that solves this problem?
A couple of things…. But first a disclaimer that I’m not a huge fan of tools in their current form….
The model has to see all of your tool definitions to be able to select the appropriate tool to use and then to format the call to the tool. That means that if you have 10 tools it needs to see the schema for all 10 calls in every request so yes they’re going to charge you for alls those tokens on every request.
10 tools would be way too many tools to pass the model even if they didn’t charge you for all the tokens. These models are easily confused so the more tools you show them the more likely they are to try and use a screwdriver as a hammer. The fewer the tools the better. If anything try to make multi purpose tools.
There are potential some ways of reducing cost.:
- You could try to separate tool selection from execution. You could use a vector database and cosine similarity to pick the most likely set of relevant tools to the query and only include the selected tools in your call. You’ll probably find that doesn’t work very well so a better approach would be to give an LLM a list of the available tools and their use descriptions (minus all of the schema) and let the LLM pick the 3 most likely tools for the query. This should actually work reasonably well.
- Long term OpenAI could reduce tool execution costs using something called KV Caching. They can basically pre-compute the attention scores for all of the tokens in the tool definitions and cache those results so they don’t have to recompute them on every pass. Less compute means both faster execution and lower costs.
5 Likes
The way you are using tools, I would recommend using the ChatGPT interface and not the API. Sure, there are differences between the two, but when I use my tools I use the ChatGPT interface because my tools are highly complex and I don’t want to be paying for so many tokens.
@stevenic is right on the money (or right on the tokens), each time you run a tool the AI has to understand your request holistically, not just picking and choosing what to understand.
Borrowing from @stevenic example, it’s like if you want to go fix something you need a tool. You’re wanting to be able to go in the garage to your big tool cabinet, open a drawer and take a screwdriver. Makes sense, but that’s not how the AI works, as the AI requires you to take the whole tool cabinet with you everywhere you go in your house just to use the screwdriver.
It’s counterintuitive, but you need to understand how the AI does things to use it more efficiently. Instead of having a prompt for Assistant that can do everything, you need to have a variety of Assistants that can do more specialized tasks if token costs become a concern… or use ChatGPT that doesn’t charge you per token.
1 Like
You’ll probably find that doesn’t work very well so a better approach would be to give an LLM a list of the available tools and their use descriptions (minus all of the schema) and let the LLM pick the 3 most likely tools for the query. This should actually work reasonably well.
This is what I was thinking as well. Maybe even go one level higher and categorize the request into different sets of tools to choose from first. I’ll have to experiment and see how this affects speed, and how much is actually saved in token cost.
What I don’t understand is why OpenAI doesn’t do this on its own. I can understand why it might need a description of the available tools for a given query, but it makes no sense to me that it also needs the full schema before it attempts to call the tool.
There are too many variables for them to do this in a general purpose way. They’re doing good to make it pick the right tool and your prompt can even break that logic.
Also keep in mind that they get paid by the token so they don’t have a ton of incentive to directly save customers money
Here’s a quick example prompt I threw together:
<TOOLS>
Web_Scraper - Extracts data from websites.
Translation_Tool - Translates text between different languages.
Image_Recognition_Tool - Identifies objects and scenes in images.
Optical_Character_Recognition - Converts different types of documents, such as scanned paper documents, PDFs, or images captured by a digital camera, into editable and searchable data.
Data_Visualization_Tool - Creates charts, graphs, and other visual representations of data.
Database_Query_Tool - Retrieves and manipulates data from databases.
Scheduler - Manages and schedules tasks and reminders.
Email_Automation_Tool - Sends and manages emails.
File_Management_Tool - Organizes, stores, and retrieves files.
API_Integrator - Connects and interacts with various APIs.
Task_Automation_Tool - Automates repetitive tasks.
Form_Filler - Automatically fills out forms.
Calculator - Performs mathematical calculations.
Unit_Converter - Converts between different units of measurement.
Weather_Forecast_Tool - Provides weather updates and forecasts.
News_Aggregator - Collects and summarizes news from various sources.
Social_Media_Manager - Manages and schedules social media posts.
Document_Generator - Creates documents based on templates.
Code_Executor - Runs and tests code snippets.
Virtual_Assistant_Interface - Interacts with users through text or voice.
Security_Scanner - Scans for security vulnerabilities and threats.
Fitness_Tracker_Integrator - Retrieves and analyzes data from fitness trackers.
Weather_Checker - Provides real-time weather updates.
Calendar_Manager - Manages and syncs calendar events.
Reminder_Setter - Sets and manages reminders.
Contact_Manager - Organizes and retrieves contact information.
Note_Taker - Takes and organizes notes.
Shopping_List_Manager - Creates and manages shopping lists.
Recipe_Finder - Finds and suggests recipes.
Expense_Tracker - Tracks and categorizes expenses.
Currency_Converter - Converts between different currencies.
Stock_Market_Tracker - Provides updates and analysis on stock market trends.
Travel_Planner - Plans and organizes travel itineraries.
Music_Player - Plays and manages music playlists.
Book_Recommendation_Tool - Suggests books based on user preferences.
Health_Advisor - Provides health tips and advice.
Fitness_Tracker - Monitors and logs fitness activities.
Recipe_Suggester - Recommends recipes based on available ingredients.
Language_Learning_Tool - Assists in learning new languages.
Meditation_Guide - Provides guided meditation sessions.
News_Reader - Reads news articles aloud.
Podcast_Player - Plays and manages podcast episodes.
<GOAL>
Create a playlist of the current billboard top 50 pop songs and start playing it.
<INSTRUCTIONS>
Select the tools needed to achieve the current goal. Return the tools as a JSON array of the tool names or return NULL if there are no relevant tools.
This returns:
["Web_Scraper", "Music_Player"]