I’m building a multi-agent architecture where several agents interact with each other using the OpenAI Chat Completions API. I want to store the chat completions, including:
- Each agent’s messages
- Any function/tool calls the agents decide to make
- Internal reasoning steps or decisions made by the agents
The goal is to have a complete, replayable log of the entire interaction stored in chat completions so that I can run evals on it — not just the visible messages, but also the structured steps the agents take (e.g., simulated tool calls, data lookups, delegation to other agents, etc.).
However, I don’t want to use the built-in function/tool calling feature (i.e., tool_calls
) because my tool execution and control logic is handled outside of the API call, in my own framework.
My question:
What’s the best way to structure and store these internal steps if I have to add them to chat completions and store them so that I can run Evals on them?
Any advice or best practices for logging full interaction histories in multi-agent chat systems would be appreciated!