NOTE: I renamed the title and updated the post below to be more precise, so ignore the first several replies up until the last one I deleted. Unfortunately I couldn’t delete this post and start a new one.
This concerns the documentaton on function callings:
https://platform.openai.com/docs/guides/function-calling
which I found to be poorly written. For the Realtime api, there was no indication of the entire flow from the moment an audio session begins until an audiio response is returned and a callback function is required during the processing.
There is no indication where the code intercepting the callback from the Realtime api actually resides. Judging from the sample code for callback functions, which is written for Node, I initally assumed that the interception of the Realtime api response occurred on the developer’s backend since that is where most Node apps are run. But this doesn’t make sense since that code has no access to the stream between the client and the Realtime api. So this means that the Realtime would probably respond back to the client when it needs to call an api. The client would then just call the developer’s backend api, get the result and send it back to the Realtime api. While this makes sense, it doesn’t indicate where the actual registration of the callbacks occur. I can only assume that they take place on the client. But this would mean that they would have to be registered before the actual audio conversation even starts. Or is the client code suppose to just send a list of all the callbacks it supports with every prompt? That would sound like a bad idea.