The claim that OpenAI uses PG ergo vector DBs are not useful is about as credible that all the startups that put Fortune 100 company logos on their websites because somewhere, one mid-level dev with a corporate card once signed up for a trial and might have forgotten to turn it off and billed a month ergo “Apple uses our product”.
Databases are tools.
While I share your skepticism of “hype” and think VCs have rushed into raising enormous rounds for vector DB startups at insane valuations without truly understanding the specialized nature of the product and segment, I think your post here might do the opposite: discount the very utility of such a specialized tool.
Can you fasten/remove a Torx bolt by jamming just the right size Phillips drivers onto it and will it work in a jam, for a single little project, or a few times? SURE. Will it come and bite you later if you try to confuse it for a Torx driver? Yes. Without a doubt. Is it the right tool for the job for a professional, at scale, wanting to give their customer the best work? No. Objectively: no.
Postgres is amazing. What a wonderful general purpose data store it is! It even has some incredible plug ins. But the very fact that its wire protocol has been used to reimplement the actual engine for things like time series, active-active, sharding, and horizontal scale tells us a very important fact: it is not the silver bullet you are making it out to be.
Your commentary is hardly objective nor based in “system engineering”: I am sure OpenAI uses Postgres. I am sure they use it for its strengths (like transactional data, HRIS applications, or the myriad other things any business does). If it underpins their actual technology as a primary vector store, I would guess that it is only with some very, very advanced, proprietary pg_* plugins, storage layers, etc that basically turn it into a CockroachDB style implementation of where it’s just the PG wire protocol talking to an enormously different storage engine (read: NOT at all Postgres).
I love me some PG just as much as the next guy, and think this “Vector DBs are the greatest thing since sliced bread and will solve all my problems and make everything else obsolete” is just as crazy as “Vector DBs are just a fad, meh, Postgres FTW”.
If you can actually substantiate that OpenAI is using vanilla-ish (or close to it) PG (and its actual storage, query, etc engine) for actual OpenAI vector or embedding use, I encourage you to substantiate your claim, but I suspect that’s not possible because 1) that information is largely proprietary 2) we know that PG as a datastore is not built for that at even a fraction of a percent of OpenAI’s scale. I am sure vanilla-ish PG exists in their ERP, CRM, etc systems abound, but that’s a specious argument to confuse that with the actual service delivery stack to try and discredit Vector DBs.
EDIT: Also, let’s not confuse pg_openai and attendant end user functions/stored procedures/UDFs with what it takes to run OpenAI’s service delivery fabric.