Suddenly, [Database] Rows Can Now Have Meaning

We all know about ChatGPT. It is profoundly expanding the possibility of creating some very smart systems. Pervasive and near-free access to LLMs (large language models) inches us closer to AGI (artificial general intelligence), which can be applied to apps and data in several ways.

Airtable, of course, can readily enjoy the benefits that services such as OpenAI provide developers. Integrating the power of LLMs for text and code completion is almost trivial. These are magical capabilities, but they aren’t the only capabilities.

Most AI experts and analysts agree that AI will become pervasive in all solutions. The ones that create the greatest customer value will blend application data, user context, and LLMs to create extremely relevant and powerful outcomes.

Data Records That Have Meaning

Airtable search is not a pleasant experience at all. The findability of discrete records in a table is terrible. Locating key data across multiple tables and bases is almost impossible. I have explored this challenge with several clients, and I’m thrilled to say all that work is now obsolete. This paper needs to be burned.

Imagine if we could quickly capture the meaning of a row in a sheet or a record in a database.

LLM embeddings make this possible. Embeddings are vectors, a fancy term for complex numeric tuples or arrays. It’s possible to get a vector for an Airtable record. The vector is a formidable representation of meaning because it is derived by associating your data with a specific vector in an LLM.

By building a simple string of key field values in a table row and using that to determine its word vectors within a model such as OpenAI’s text-embedding-ada-002 LLM, you will capture the mathematical meaning of that record. But to transform this approach into a solution, you need a few more pieces of machinery; a vector datastore.

Vector databases (like Pinecone and Weaviate) have been around for a while. Still, you’ll soon hear a lot more about them because they are necessary to store the natural language essence of any information.

Opinion: If Airtable were on its game, it would already have a vector data store baked into its architecture, but sadly, I predict it will try to solve the search and findability crisis with a Lucene-like architecture that I said should now be burned.

I’m using Airtable data, vectors, and LLMs. It’s a bold and profoundly powerful experience when users can employ natural language to locate their own information. Or to discover related information without being forced to describe deeply limiting relationships through linked records.

2 Likes

Very cool! I can totally see the usefulness of embedding rows in a database. Especially if it’s not obvious which “column” you need to search on, but you have a general query.

1 Like

Indeed. This is one of the key advantages - you get to create the embedding cone with selected and “boosted” or weighted columns. It’s very powerful, and the “training” is far simpler.

I have wondered, though, when I use private data to generate a vector, is that data captured and used by OpenAI? Or, is it safe to say that embeddings help to insulate customer data from OpenAI’s general model training set?

The general consensus is that the data you send to the API is not private.

But out of curiosity, how are you boosting or weighting the columns?

I use vector filters to do this, but it may also be possible to call out specific values in a prompt resulting in an embedding that is given more sway. I have not tested this approach yet, but it stands to reason that you can prompt-engineer your way to embeddings that emphasize certain terms.

My approach so far is more traditional. With the vector in hand, I also attach metadata to the Pinecone vector itself. This makes it possible to order vector results in a manner that emphasizes certain terms or even tokens in longer strings.

1 Like