Storing chat completions in a multi agent architecture

I’m building a multi-agent architecture where several agents interact with each other using the OpenAI Chat Completions API. I want to store the chat completions, including:

  • Each agent’s messages
  • Any function/tool calls the agents decide to make
  • Internal reasoning steps or decisions made by the agents

The goal is to have a complete, replayable log of the entire interaction stored in chat completions so that I can run evals on it — not just the visible messages, but also the structured steps the agents take (e.g., simulated tool calls, data lookups, delegation to other agents, etc.).

However, I don’t want to use the built-in function/tool calling feature (i.e., tool_calls) because my tool execution and control logic is handled outside of the API call, in my own framework.

My question:
What’s the best way to structure and store these internal steps if I have to add them to chat completions and store them so that I can run Evals on them?

Any advice or best practices for logging full interaction histories in multi-agent chat systems would be appreciated!