Messages in an array vis-a-vis assistants API

Continuing the discussion from GPT-3.5-turbo how to remember previous messages like Chat-GPT website:

Well, the assistants API is still “beta” but it will be nice to know the difference between using that and the messages history in an array, from the point of view of quality of responses.

The pricing of assistant API vs. the messages in an array is a matter of testing and figuring out, I guess, based on the requests and responses.

Having said that, it will nice to know the difference in the quality of the responses between the two ways

Any thoughts/comments anyone, please?

Based on how fast OpenAi is able to iterate, I would say that once Assistant has streaming it will be more reliable to just use the Assistant API.

One reason not to use assistants is token counting, if it is true that MS is losing money on copilot by charging a fixed amount, the safest thing to do is to have token accounting in your system, and transfer that to clients somehow, or have some fallback mechanics.

The second reason not to use the Assistant API is that it has very little use for RAG systems because the file limit is 20, so you can’t really use it for large collections of documents, imagine being an insurance company and wanting to use your customer’s record in the API, it would be great that have the whole set integrated in one action. Still, I think that that limitation will be removed soon.

Thanks @rbritom

So currently Assistant API doesn’t have streaming? Hmm

Streaming apart, I think the best way is to manually test on the playground with the Assistant API and compare it with the other option using message arrays :frowning:

Sounds like a painful part of manual testing (as far as the playground is concerned) :frowning: