Hey there!
So, file uploading with assistants is basically just RAG (retrieval augmented generation) that OpenAI handles for you. If you wanted to retrieve and use particular documents or knowledge files, you would have to build the db yourself, along with the embeddings.
There are typically lots of docs in the cookbook we can cite for this, but your use case might be a touch more unique because you’re using Unreal. The big question I guess would be; did you want this managed w/ C++ or python? I don’t know how Unreal integrates the API necessarily. Admittedly, I grew up using Unity, so I haven’t had enough time to completely migrate over to Unreal yet.
That being said, it can certainly be done. I’m currently building something that’s mainly in Rust, but uses python almost exclusively to make the API call itself. The other benefit is that you can perform genuine multithreading with these kinds of languages.
In terms of context length, currently GPT-4-turbo-preview has the highest context length at a little over 100k tokens. The others typically use around 32k. The advantage to assistants is essentially ease of use; it’s a bundle of things that are already pre-built and ready to go that you would otherwise have to construct yourself. Keep in mind though, especially since you’re running it in a unique engine, you would need to consider how to manage Assistant threads. Would they be one thread per user of the game, or would each thread represent its own chat instance in the game?
Honestly, I’d recommend you stick with chat completion, but that’s just me. You can mold it better to your use case.