Multimodal queries with voice

roderik · May 15, 2024, 4:10pm

Hi,

I was very impressed by the demos of the GPT-4 Omni model and had a query regarding its capabilities.

I am currently exploring the use of the Speech-to-Text transcribe functionality as input for GPT-3.5-turbo completions. At present, this requires two separate API requests:

Transcription
Prompt Completion

I was wondering if it is possible, or if there are plans to implement, a way to combine these steps into a single API request, similar to the examples provided in the Vision API documentation: https://platform.openai.com/docs/guides/vision

Thank you for your time and assistance.

Best regards,
Roderik Steenbergen

Topic		Replies	Views
Need an API That Combines Audio Transcription and Translation Feedback api-realtime	1	92	May 17, 2025
What will be the final/full released capabilities of GPT-4o in the API? API gpt-4 , chatgpt , api	0	2012	May 27, 2024
GPT4o Realtime Prompt Engineering API	1	435	January 9, 2025
Text to Speech included in Chat Completions API (combine 2 API calls to 1) API	0	725	November 7, 2023
We need to send multiple prompts in one request for the chat endpoint API	4	4215	December 17, 2023

Multimodal queries with voice

Related topics