Better approach to build a chatbot on llama2

I’m trying to build a custom chatbot with enterprise data for information retrieval. For that I’m currently following the below approach, anyone can suggest better approaches?

Embedding all the documents using “all-mpnet-base-v2” pre-trained model and extracting vector embeddings, later based on user query, most appropriate document is being extracted and the top responses are later sent to llama2 for getting the final response.

  • An other approach could be fine tuning llama2 on the entire documents data. For that I’m confused how should I represent my data and I just have one T4 GPU.

Any other approaches will be much appreciated. Please correct me if I’m doing anything wrong. Thanks!