Hi @bill.french
This is not really necessary nor true and it is basically “tech hype”, in my view, to be honest.
Storing searlized data in databases is a technology “as old has the hills” and even PHP forums going back two decades store large arrays of data as serialized data (which are more complex than single-dimensional serialized embedding vectors).
In other words, it is trivial for any experienced webdev to store embedding vectors in a DB as a serialized object and to query the DB and preform the linear algebra fun and games with these vectors.
In fact, I do this very thing with OpenAi embedding vectors on a daily basis using a DB, and here is an example from one of my Rails projects models, showing the fact that the actual vector is serialized by the DB automatically (basically a long-established built-in DB function to serialize arrays):
class Embedding < ApplicationRecord
serialize :vector, Array
end
This simple DB model above contains both the text (chucks) and the embedding vectors serialized and is very fast. When we want even more speed we simply use Redis and it’s blazing fast in memory.
create_table "embeddings", force: :cascade do |t|
t.string "openai_id"
t.string "model"
t.string "prompt"
t.string "vector"
t.datetime "created_at", precision: 6, null: false
t.datetime "updated_at", precision: 6, null: false
end
To be honest, I don’t use Pinecone or Weaviate but I have looked at them before and as I recall, these DBs are basically network services; which means they require network calls. They seem good at marketing their services as “must haves” for vectors, and good on them for marketing, but having a database “on the same network as your app” which requires no external network calls is actually more reliable from a network system engineering perspective and less costly since MySQL or PostgresSQL, etc are basically free but these “vector DB services” are not free. Furthermore, most large organization already have very competent DB admin teams.
Databases are basically “free” for many developers. MySQL, PostgreSQL, etc are free to download and any experienced app developer can easily set up these DB to work fast with vectors, especially OpenAI vectors which are single dimensional.
Well, agree that there is a lot to worry about in tech, and this is just one of 100s or even 1000s of things we system engineers and software developers can worry about “in the future”.
If we apply the “worry about the future” logic, then we can be “afraid” that vector DB services might go out of business, go offline, be hacked, prices increased, etc to infinity; so as a systems engineer as well as a developer, my objective is to apply “future concerns” equally across the software engineering spectrum with being caught up in the “tech hype of the year” cycle.
We also know, BTW, that OpenAI uses PostgreSQL DB and not these third party vector DB services and OpenAI recently announced they are scaling up their PostgresSQL infrastructure to help them with exponential growth.
Just keeping things objective from an independent, system engineering perspective.
Hope the helps.
