To scale Gen AI, should we focus on compute or memory

I wanted to discuss this with others in this forum.
Most of the focus now is on compute for Generative AI. But I see we should focus more on memory architecture than compute. The reason our brain is efficient is because the cortex is not a computer, it is a memory system. So, we need to develop a robust memory system. To build a memory system, I think the best tool is to use a graph db. But what I am not able to decide if we should use RDF or a property graph to develop the memory architecture.

2 Likes

Great point on focusing on memory architecture, akin to the human brain’s efficiency. Thanks for bringing up this topic for discussion.

I sit in the camp of the future of generative AI, and solving scalability issues, hinges on a blend of approaches and there won’t be one solution that rules them all. I tend to have a brevity issue with my forum posts, so I will attempt to just bullet some thoughts below to help spur additional discussion.

  • In Sam’s recent conversation with Lex Fridman, he posited that compute may act as the currency of the future but emphasized smarter allocation of compute based on task complexity. This is an interesting and nuanced approach that could assist with scalability.

  • Innovations such as decentralized agent swarms, GraphRAG and languages like Mojo could offer a more scalable path forward.

  • Techniques like FSDP and QLoRA, as explored in a recent deep dive (Answer.AI - Enabling 70B Finetuning on Consumer GPUs), further highlight the potential for balancing compute and memory effectively.

I think it is really this fusion of ideas, and others as of yet undiscovered, that may well define our strides toward scalable, efficient AI.

3 Likes

According to the The Bitter Lesson compute is unfortunately always ideal

Interesting thought. I do think that this is an oversimplification of how our brain functions however. Going back to the Bitter Lesson:

The second general point to be learned from the bitter lesson is that the actual contents of minds are tremendously, irredeemably complex; we should stop trying to find simple ways to think about the contents of minds, such as simple ways to think about space, objects, multiple agents, or symmetries. All these are part of the arbitrary, intrinsically-complex, outside world. They are not what should be built in, as their complexity is endless; instead we should build in only the meta-methods that can find and capture this arbitrary complexity.

1 Like

Scaling a pretrained transformer LLM and calling it “AI” is like in 1880 saying that what we need is ever bigger steam engines.

What we actually needed was an advancement in technology, like the internal combustion engine.

And now, what we need is something closer to actual artificial intelligence, which these pretrained models are not, at ll. They are a really useful tool for managing data, that has no intelligence whatsoever once training is over.

It’s good to keep expanding what these can do, but it isn’t AI, and it’s no more a long-term solution for AI than moving from steam engines to diesel-electric locomotives was for human transportation.

The focus on this flashy, but unintelligent kind of model has distracted us from REAL progress in actual intelligence. Sometimes it seems that the lack of intelligence refers to human, as well as artificial.

So what we really need is to work on dynamic learning, instead of pretrained models, and even among pretrained models we need to work on advancing the technology, for example to solve the need to tokenize more than language, so that the transformer can “understand” information in other formats better than the LLM can kluge “music” and “images” post-training.

Part of the problem with solving that last, really, is that the tokenizer was sort of a cheat, insofar as it let us bypass the bottleneck of how to get a model to understand the concept of words in the first place. But in doing so, we stuck ourselves in a position where we don’t understand how to let that tokenizer encode other things, like images or music, in a compatible format.

I find it frustrating that, instead of a dozen fascinating problems like that, the focus HAS been mainly to throw resources at the existing tech, and to tweak it.

Think of how NASA dead-ended space flight technology, by endlessly re-engineering the existing Nazi rocket tech for fifty years. “It’s not rocket science” was actually true of NASA engineers: They were not scientists at all, not inventing new technology (plasma drives aside), and it stagnated our progress in aerospace.

The same thing is happening, here. This isn’t AI science, it’s pretrained model engineering.

Google engineers published the science part with the paper “Attention Is All You Need”, and OpenAI engineered and capitalized on this research.

So without engineering you wouldn’t have anything tangible, and without science you wouldn’t have anything to engineer.

The value of transformer based LLM’s to the public is immense. Even as other architectures unfold, it’s important to actually create something useful, which is what we have with “pretrained model engineering”.

So we need researchers to formulate new ideas and approaches, engineers that design and build these things and get them to work, and marketers that get people to fork over their money.

It’s not all science and research. Although research is important, and AI research is on fire right now, with many hypothesis being generated and tested. It’s hard to keep up with it all.

1 Like