Using GPT3 on database schamas

Is it possible to give GPT3 your database schema, so that it will know about your databse, and then ask it to perform questions from it?


I’ve seen people using CODEX to generate SQL queries so I assume so.

That’s generally possible with the combination of /search and /completion (with Codex) endpoints.

  1. use search, to given a user query, find the most relevant tables from the schema
  2. use /completions, where inside the comment you show the most likely relevant tables, and then you add user query, and ask Codex to complete the SQL statement. Step 2 is shown in this example.

Can we use answers api instead? As it inherently does both search and completion for us?

@omkar did you come up with a solution to this?

This is doable if you specify the DB details within the context of a programming language which supports SQL queries like Python, Ruby, PHP etc



@boris - Now that the /search endpoint is deprecated, how could this be achieved? Embeddings are the suggested alternative, but in order to find the most relevant tables for a user’s query, what sort of data would need to be embedded?

1 Like

Great question @kvnam looking to understand the same thing, any help would be much appreciated

Hey, I’m assuming you haven’t tried using the hybrid index?

You wouldn’t use embeddings for a keyword query, you would use a typical relationship database with ideally some sort of NLP entity extraction. However, it’s also ideal to use both and compare the results together for maximum performance. Keep in mind that you can use GPT for entity extraction. However it’d probably be more efficient to train your own.

“Hello, my day is fine. I’m curious, what is the capital of London?” → OpenAI( Embed the sentence for semantic relevance ) → [Optional] Extract the query “Capital of London” → Rank using a Search Engine → Compare both results’ score

In this case, they both should return the correct answer, assuming you have it, which would be stored as:

Dense Vectors: (Wikipedia page of London)
Sparse Vectors: (Important factual keywords extracted from the page)

Thanks @RonaldGRuckus I am trying out the hybrid index. I work with JavaScript so translating some of the code and finding packages has been slow. I understand the concept of hybrid indexes, and am currently testing a BM25 package to see if maybe that will suffice to provide me with the right set of results which I can then send as prompts to Open ai for the final answer…
Since they are discussing conversion to SQL queries, was curious to know if there is a way to convert the question to the exact entities and then a SQL query, if I provide my schema to a model… But I think the hybrid index approach might be simpler to try first…

Javascript as in NodeJS? It’s all so new. Based on the docs on Pinecone’s website, they do intend to provide a helper class so we don’t need to prepare the sparse method. I believe they’re having a seminar regarding it soon. You can also find a little more helpful information here:

1 Like

Yes @RonaldGRuckus I meant Nodejs and you are right it does seem like a more intensive approach to use SQL queries. Most packages in JavaScript for BM25 directly go through and return the results for the query. I asked ChatGPT how to convert these to embeddings and it suggested

Once the BM25 object is initialized, you can generate embeddings by calling the searchmethod with each query as input. Thesearch method will return a list of document IDs sorted by their relevance scores to the query. You can treat these relevance scores as embedding vectors for each document.

I’m still trying to work this one out, but before that I’m checking if just a BM25 run is enough for me to generate the prompt context documents.
And thank you for sharing that link, it definitely explains the use of the two sparse and dense embeddings in more layman terms that are easier to understand. I just need to figure out the Generate Vectors: step for sparse vectors with Node.

If you are generating BM25 embeddings you should at some point see something like this:

    "indices": [2, 4, 6],
    "values":  [0.1, 0.3, 0.5]

This is the sparse vector data that you would feed alongside your semantic embeddings generated by OpenAI

1 Like

@kvnam were you ever able to generate the sparse vectors in node?

I made this work with a postgres db where you ask it questions in English, it generates a query, runs it against the db, and provides the results. Works surprisingly well, but runs into issues periodically.
If there’s interest, I can look into publishing it on github.

hey @dror thats great I would love to see the github link, interested on how you did this. Thanks!

OK, so this is more of a proof of concept than a real product, but I went ahead and published.

It works surprisingly well.