I’ve been exploring some ideas for a very high-level autonomous assistant capable of running computer commands and calling external APIs (all while using long-term memories).
After many experiments ran via the playground and trying different prompts (system messages), I can’t definitely see a significant difference in the performance of GPT4 vs 3.5-turbo.
With 3.5, there are often instructions that get “lost”, or are not so well understood. GPT4 seems to take it all in and handle it with almost 100% accuracy (I’ve noticed a few gaps here and there).
Noticing these gaps in 3.5-turbo makes me not want to develop this further using GPT3.5. Unfortunately, the cost of running GPT4 is still quite prohibitive for bigger projects.
Fingers crossed that in the near future, they will be able to reduce token costs.