Can I use only realtime audio input API (without output speech generation)?

Since output speech generation is a little bit too expensive How to decrease cost spend in Openai Realtime Console demo? , can I use just audio input API to recognize speech in streaming mode? Any docs or sample of such usage?