I spent all weekend building with the new OpenAI API announced at DevDay, and while they are absolutely incredible we still have some ways to go, here are some of the limitations I’ve run into.
- Unable to use this with the new Assistants API, which makes it so you can’t leverage Threads and Messages to keep history. I did try to hack it by creating Messages from the chat completion API and sending them over into a Thread but Messages only support adding a Message with the role “User” creating an inconsistent conversion experience.
- There’s a size limit for images which is fine if you compress before sending over but it seems that making your images smaller causes the AI to miss read or dismiss small text in your images.
- Sometimes makes things up or misunderstand images but that was to be expected.
- Rate limits make it hard to experience with long form real time processing on video
- More a question but was looking for was to update GPT-Vision knowledge, is there a way? Would this be via context injection or fine tuning?
- The new Run mechanisms requires you to poll for the run status before getting your output which is never fun and creates undetermined delays since the user needs to implement their own polling. Would be nice to have a webbook here
- When using Text to Speech and then Speech to text along with Assistant Runs, there’s a significant delay in processing which creates a broken/delayed experience for the end user. Could use simplification and optimization from OpenAI without requiring so much overhead for developers.
- Does not support GPT-Vision in Messages, probably the biggest bummer for me personally.
- No streaming support, tho I hear this is coming.
- An Add messages and Run would be nice
Hope someone in OpenAI finds this useful and happy to test GPT-Vision updates as they come!
Thank you for all the amazing work you’re doing, OpenAI is changing the world!