It sounds like you are trying to build something similar to webGPT and the new bing.
Finetuning wouldn’t work since the data is real-time, and embeddings would get expensive everyday, still the model wouldn’t be able to carry knowledge of breaking-news level events.
Not to mention curating that kind of data isn’t going to be easy.
However, I may have a rough idea of how you can hack a prototype. Here’s the outline:
You’ll need access to a search API.
Every user message will have to result firstly into a search.
The results semantically closest to the user query will have to be “opened” and the contents retrieved.
Then the embeddings can be obtained for the content and the completions endpoint be used to generate a response.
This is still very hypothetical, but quite intriguing.