Hi, i plan to use chatgpt api assistant along with vector store to create an ai agent that that could answer various questions about a friend of mine (career, family,personality etc…). i have around 70 pages long interview with him. the data is currently structured as q&a pairs - closely related questions about the same subject are near each other with the subject above them , something like :
sport :
Do you have an exrcise routine? tell us about it
yes, i got to the gym twice a week, also…
Did you ever won a prize in a sport competition? tell us about it
I once won silver medal in a local competition back in my hometown at the age of 15, it was a…
…
music:
what is your taste when it comes to music?
…
My question is, should i keep the data as is before embdding it in a vector DB or should i first remove the questions and create more summary-like structure?
*note : this data is served as the general knowledge base of the model and not for fine tunning (if i ever will do that).