Hello people,
Hoping this is the right place to ask this question, recently I’ve got fascinated by the idea of connecting a 3D face to GPT with Realtime API. Just how GPT sends delta events for transcripts, I want to do the same for a custom gpt function which takes the 52 standard ARKit params, that range in 0 to 1 in value and regulate facial expressions. These blendshape params are passed on to a 3D face model.
Example is at threejs → examples → webgpu_morphtargets_face.html
Description: arkit-face-blendshapes[dot]com
The delta events received when the gpt function is invoked do contain the data, but it contains chunk of json formatted text and can contain other character too and mapping is challenging unlike transcripts where every delta content only have characters. The final response.function_call_arguments.done
type event of function does contain the final array of expression that should be played as AI speaks, but the event is only received once AI finished speaking. So not sure what could be the way to do it, if its technically feasible.
Check my X recent post for a working demo by using the params from the done state without voice output.
Also, GPT is currently not producing the best expressions and lips syncs to a voice it generates. Would appreciate any help or recommendation on how to improve it.
Thanks.