Structured Data & Semantic Search : SQL or text-to-SQL or Vector search?

I am trying to understand use-cases for vector search on structured, business/enterprise data. An interesting extract from the below post -

By building a simple string of key field values in a table row and using that to determine its word vectors within a model such as OpenAI’s text-embedding-ada-002 LLM

Assume there is a relational table of doctors with the usual columns - name, age, address, discipline, education, work experience, visit hours and many more columns.

And a user query is - Find all orthopedics who have greater than 10 years of experience and speak French and studied in Europe …

How does the application handle above query?

  1. In the early days, application would have just some input boxes and check boxes for basic filters and then compose/execute a SQL :- SELECT * FROM doctors WHERE …
  2. Now today we can do text-to-SQL using specific trained LLMs - pass the above English query and hopefully the model returns a correct SQL.
  3. My question really is if vector search fits here? First step, generate vector embeddings for each table row after text’ifying the columns as key-value pairs. Next, generate embedding for the search query and do a vector similarity search.

The doctors table is just for illustration. I am trying to understand usecases for vector search on structured enterprise data - transactions, invoices, orders, inventory, customers etc. I am comfortable with vector search on documents/images/social media etc.




I think you already found the answer, and my reply is more of a confirmation. Vector searches are a rather basic method of data retrieval, especially useful for comparing semantics. In other words, if you aim to retrieve numerical data from a table, using an LLM to support the user in retrieving the data and then presenting the results is a proper application of this technology.

Language models are not good with numbers. You should always include additional tools to assist the model when working with this type of data. This is why the code interpreter/advanced data analysis tool is so powerful. It helps the model find the meaning in the numbers and then write a text to present this to the user.

Hope this helps!