Understand user question and intent to query data

Hi, we want to allow users to ask questions about the huge data that we have in our backend. For instance we have customers, orders, products etc… Our data is sensitive and cannot leave the company.

Users want to ask questions like
How many customers ordered product “XYZ” in the last month?

Ideally we would get back from OpenAI something like
Intention: COUNTING
Filter: PRODUCT=XYZ, TIMEFRAME=lastmonth

We would use the result to query the backend and show it to the user.

Question: How can we instruct OpenAI to “understand” such questions and know how our data is structured?

P.S.: We cannot simply convert this into SQL query because of how the data is stored.

I would investigate the embeddings path.

  • Imagine vectorizing every element of your data.
  • Vectorize each user’s query as it occurs.
  • Perform a semantic similarity match and aggregate the top-most related data.
  • Build a prompt that uses the highly-relevant data in a learner shot and use GPT completions to generate the answer for the user.

This is a very pixelated view of the approach. A lot of software engineering goes along with this brief outline. However, it works, and it is cost-effective at scale. This approach also creates opportunities for more innovation and defenses against hallucinations.

1 Like