Hi everyone,
I’d like to share my journey of building an AI-powered virtual employee system and explain why I moved away from the Assistants API by OpenAI and got into ChatCompletions ONLY based system. I’m also looking for advice from the community on how to enhance my system with dynamic, goal-oriented AI agents—potentially revisiting the Assistants API or exploring other tools. Additionally, I have questions about implementing training for AI agents and optimizing token usage.
Note: Every time a user sends a new message, I use chatCompletions
to retrieve global variables and move forward through the stages/actions (you’ll see what I mean here). So, all the ‘AI Agent’ intelligence is based on chat completions, where I always provide context using message history and general instructions for the model to follow.
Chat completions play a crucial role in several stages of the system. They are used for variable extraction by sending a prompt that includes the user’s message and the expected variables, which returns a JSON with the extracted values or null if none are found. They assist in step selection by analyzing the conversation history and current variables to identify the most relevant step. They also handle action ordering, where a prompt organizes the actions of a given step according to its general instruction. Additionally, chat completions are responsible for intent verification, classifying the user’s message to determine the appropriate next move. Finally, they generate natural and coherent responses tailored to the conversation’s context.
My System: Purpose and Design
My system is designed to create intelligent virtual employees that interact with users naturally while completing structured tasks (e.g., scheduling meetings, answering queries). It’s built entirely in Java, with native integrations for:
- Google Calendar
- Google Meet
- Gmail
I’ve avoided no-code platforms entirely, opting for a fully code-based solution. This choice gives me greater control and optimization, ensuring the system behaves exactly as intended without the limitations of pre-built frameworks. By building a custom system in Java, I have full control over the logic and integrations, allowing for optimizations that aren’t possible with no-code platforms or the Assistants API.
Why I Moved Away from Assistants API
Initially, I explored the Assistants API for its impressive conversational abilities. However, I encountered several challenges that didn’t align with my goals:
- Unpredictable Behavior: The API’s autonomous decision-making often caused conversation flows to stray from the intended path, disrupting structured tasks.
- Inconsistent Tool Usage: Tools (e.g., for scheduling or calculations) weren’t always triggered at the right moments, leading to errors or missed actions.
- Lack of Standardization: Responses varied too much between interactions, making it hard to guarantee a consistent user experience.
These limitations pushed me to develop a more reliable alternative.
My Solution: A Stage-Based System
To overcome these issues, I created a stage-based system in Java, where:
- Interactions are divided into stages (e.g., collecting info, executing tasks).
- Each stage consists of smaller actions (e.g., extracting variables, calling functions).
- The system only moves forward when all actions in a stage are complete.
Benefits of This Approach
- Predictable Flow: Actions occur only when prerequisites are met, ensuring logical progression.
- Reliable Tool Calls: Functions and integrations (like WhatsApp or Gmail) are invoked at precise, predefined points.
- Error Prevention: Data is validated at each stage, fixing ambiguities before advancing.
- Consistent Experience: All users follow the same process, regardless of how they phrase their inputs.
This structure has made my virtual employees efficient and dependable, leveraging the power of native integrations without the unpredictability I faced with the Assistants API.
Seeking Advice: Dynamic AI Agents, Training, and Token Efficiency
While my system excels at control and precision, I’d like to make it more dynamic to handle varied user needs without losing its reliability. I’m reaching out to the community for insights on the following:
1. Creating Dynamic, Goal-Oriented AI Agents
- How can I use the Assistants API (or other tools) to create dynamic, goal-oriented AI agents that adapt to user input while staying on track?
- Are there techniques for integrating such tools into a structured, code-based system like mine?
- What alternative frameworks or approaches balance flexibility with precision?
2. Implementing Training for AI Agents
In the very near future, I’ll need to implement training capabilities based on documents and data specific to each AI agent in my system. My idea is to create a dedicated assistant for each AI agent, responsible solely for generating the final response to the user. This assistant would consider both the processed data from the last user message (including action results and extracted variables) and the training data to craft natural, context-aware replies.
- Is this a good practice, or would it be better to use an assistant from the start of the interaction, not just for the final response?
- What are the best practices for training AI agents in a way that impacts only the final response without disrupting the structured flow?
- I thought about creating a single trained model and using it for all chat completions, or using a trained model specifically for an assistant via API.
3. Token Efficiency Considerations
Currently, each chat completion call in my system uses about 500 input tokens and 100 output tokens with the 4o-mini model. Even with 3 to 5 calls per new message, this appears to use fewer tokens overall than the Assistants API does when executing runs in threads.
- Has anyone else compared token usage between custom systems and the Assistants API?
- Are there strategies to further optimize token usage in either approach?
If you’ve tackled similar challenges or have expertise in building controlled yet adaptable AI agents, I’d love to hear your thoughts!
Thanks for reading, and I look forward to your suggestions!
INB4: I’ve tried every(almost) prompt strategy to improve the instructions in the Assistant API, so that’s not the main issue when it comes to misbehavior or the model not using things as it’s supposed to.