Thank you for sharing your insights and for the suggestions on how to make the project work. I appreciate your advice on the low-token solution and backend iterative procedure.
I was also wondering if we could potentially use embedding with ada to search the database for the most matching question/answer pairing and make the answer part of the prompt that is sent to GPT 3.5 API as context. This way, we could avoid transferring all the questions we want to match with our answer to the model with every request.
Do you think this could be a viable solution? I would love to hear your thoughts on this. Also, I am not sure if having a huge database of 500+ pairings to search through using embedding will increase token usage - in theory not because only inputs and outputs use tokens right?