I have created an AI audio recorder that summarizes the conversation between two people. I had someone try it for a little longer than expected and they got no result back. I’ve used it for short conversations and got great results.
How could I make it possible to track say…an hour conversation? Would I want gpt-4-32k for that?
You have gpt-3.5-turbo-16k. And you’d sure want to use it.
Price for 16k of input or 16k of output:
Whisper is going to need parts, segments of audio. Check the maximum audio length and file size limits, and be conservative with your audio splitter.
When you do get the final transcript back from reassembling conversation chunks, then you certainly might have more tokens than AI input. Another case where you can chunk (with overlap), ask for an AI summary, and then put a half-dozen summaries in for a final summary. (the word “summary” to the AI means a quite short passage, so you might instead want it to write “an article” based on the transcript summaries taken as a whole).
The Whisper model is intrinsically designed to work on audio samples of up to 30s in duration. However, by using a chunking algorithm, it can be used to transcribe audio samples of up to arbitrary length. This is possible through Transformers pipeline method. Chunking is enabled by setting chunk_length_s=30 when instantiating the pipeline. With chunking enabled, the pipeline can be run with batched inference.
so, wouldnt it therefore actually be 18,000 words an hour because its two people?
maybe not considering you can only push 120-150 words per minute.
you can do time slice during audio recording in the front end. say every 5 minutes, you send the audio data to the backend. then only process it when the user prompts for summary. so let us assume it recorded 1 hour of data so in the backend you have now 12 files of 5 minutes of audio data. you send one by one to whisper api. use part of previous whisper result as prompt to make a seamless transcription. then after every audio data is transcribed, send it to chat api for summary.