Short video showing our studio's work, including our use of the OpenAI API and our move to local LLMs instead, specifically Qwen3

Here is the video. It is 6 minutes 43 seconds long and covers a lot of territory.

I think that the local .gguf LLM models could pose an Innovator’s Dilemma situation for OpenAI, as they are not that far off from the performance of the latest ChatGPT models in the API, and you don’t have to send your private data to a cloud LLM and pay money for each inference request.

In the video, I relate the frontier US companies (Meta excluded) to IBM in the 1970s, providing expensive mainframes and minicomputers while nascent startups like Apple and Commdore provided “good enough” results with their early personal computer offerings, like the Apple II and Commodore 64.

OpenAI doesn’t need to fully pivot to local, but they should provide local .gguf LLMs that can at least compete with the best of the open weight models coming out of China for free.

As a developer, I have no desire to continue paying for API requests in my LLM application integration, and I want to ensure privacy to my customers, who don’t want their IP and private corporate data being sent to OpenAI or Google Deepmind, to be harvested and used in future training runs for their next generation models.