I created and tested an Assistant bot made with OpenAI GPT-4 Turbo. Enabling both Code Interpreter and Retrieval.
The INSTRUCTIONS I gave the assistant were pretty simple: it should help people to generate flowcharts with python and graphviz.
In the test, I’ve given to it a short json string with clearly defined nodes and links (taken from ComfyUI workflow).
The results of the test where disappointing:
The Assistant was not able to generate a graph with more than one node.
It was not able to write the complete content of the node in the node box, no matter what I told him (even examples were useless).
Every 5 minutes the assistant forgot all the previous input data and even the python scripts generated by him were being lost in the frequent context resets. I had to continuously send him the input data again and again.
At some point during the session the AI stopped posting a working link to the output files he generated when trying to create the flowchart. Originally the link opened the correct page of the File section with the file id, showing the button to delete or download the file. (I was using the playground to test the Assistant from my iPad Pro). The download button never worked on Safari, but having the file id was enough for me to download the file from my python cli. But half way the Assistant stopped giving me a working link. It was broken, and in the browser console it gave the error: “TypeError: undefined is not an object (evaluating ‘response.ok’)”. Interrogated, the Assistant was not able to come up with a fix.
So I asked the Assistant to just encode the output file as a base64 string and post it in the response message as code. It failed to do even this basic task. So I gave up.
Now tell me: I’m supposed to let people use this Assistant and even earn money from it? It’s a joke.
Considering how much all this costs and how much I already paid for using this service, I was really expecting something decent. But this is incredibly useless. How can you create a service with this?
Here is the transcript of the whole session (minus the Code Interpreter logs because currently there is no way to recover that part from a thread with the OpenAPI beta):
I am curious to see what would happen if you uploaded the file as a JSON to the assistant (It does support JSON). I think as you go down your thread it starts discard stuff. If you upload it that would not be the caase. I have a lot of Assistants running and I like a lot about it - but there is certainly room for improvement. As documented here for example
If you read the transcript, at some point I asked the Assistant to save the json content as a file somehwere in its environment, so it can be retrieved later as a backup. The assistant did it, but after few minutes a context reset deleted everything again. It is clearly impossible to do any work like this. I don’t know what the developers at OpenAI are thinking, but it looks like a big joke to me…
I would start by actually uploading it. ANd then asking your questions. I do like the thought of it ‘saving’ the file from your text prompt.
Oh and remember it is clearly in BETA at the moment.
Well, honestly it doesn’t even look like an ALPHA to me. My impression is that they really ‘oversold’ the idea of using GPT as an Assistant. If it is about being a smart ChatBot and having a nice conversation about any topic, it’s fine. But trying to sell it as an Assistant service capable of doing real jobs? I don’t think they are there yet. I think we all overestimated GPT. It is still very far from being truly able to complete tasks for us. Even when it came to coding, the best it can do is to suggest some small completions or code snippets as we write the code (as Codepilot does). But I still haven’t seen a single complete program written by it without heavy human intervention. Not even small scripts. It is just not mature yet. OpenAI was riding the wave of enthusiasm and somehow it took it too far with this Assistant thing. It was way too early for something like that. Maybe in 3-4 years GPT will be mature enough for such things. But now they are selling the equivalent of AI snake-oil… GPT as a panacea to solve all problems…
Yeah, they are a mess at the moment. I have seen them work amazingly well using the api, especially right when they were released, but it feels like my new favourite term:
sorry - but couldn’t disagree with this more. it’s very much possible to complete meaningful tasks that have a real-world applications, whether that’s with the regular API or with the Assistant. might take a few iterations to get it right and you need to break down tasks, optimize instructions etc.
If I have to meticulously guide the assistant through explaining and optimizing each step for rendering the unique structure of every input file, it might just be quicker for me to do it myself… no to mention the fact that, obviously, it is unlikely that any customer would possess the ability or willingness to engage in such an activity.