How can I pass a system prompt and audio user input to get a text output back?

@Foxalabs , to verify, this won’t support structured outputs, right? I’m trying and it’s failing.