Question answering using embeddings-based search

I am using as a reference the code in openai-cookbook/Question_answering_using_embeddings.ipynb at main · openai/openai-cookbook · GitHub

My use case is simple as well: I want to read #n documents and run Q&A. I know that I have a max length of text that I can use in gpt-3.5-turbo 4,096 tokens (~5 pages)

I have few questions:

  1. I see in output[7] that the text was split to different rows. Was it done due to accuracy? Meaning, the more I split the text to sentences, the more the answers will be accurate? Or I can use text up to 4,096 tokens to build the embedding

  2. How do I work with the case I have more data than the max length? Any ideas?

  3. I have noticed that the questions I ask cannot be a “smarts” for example, If I have a text of a painter that painted a paint of blue flowers and I asked: “Suggest a name for the paint” I got no answer. What can I do to improve it?

Thank you!.

1 Like