What is Q*? And when we will hear more?

Isn’t that what Open AI was created to do?

Yes, that is what it sounds like to me, zero-shot CoT.

But I think the power is that instead of predicting the next token, you are predicting the next reasoning, leading to more reasoning, etc, and this is building in a type of knowledge graph inside the LLM, which would be a big advancement.

So yeah, just build the knowledge graph, have the LLM traverse it, then respond with the final result. But if this were built in, it could skip all that external processing, and reduce latency, and apparently make the model much better. AGI level? Not sure, but lots of “auto CoT’s” within the model would result in a much more sophisticated model, and less prone to hallucinations, as the paper states, since CoT forces this mechanism, and CoT is less prone to hallucinations.

Hallucinations are really holding up any real hope for AGI. So solving this is key.

By the “median human” definition of AGI, having small CoT loops in the model might be all it takes.

As for ASI or hardcore math, I think it still comes down to other architectures. I’m thinking somewhere in the realm of neuromorphic computing, high speed devices with large bandwidth like \Sigma\Delta DAC’s and ADC’s.

In a private DM today, I estimated insane bandwidth with low power consumption with \Sigma\Delta:

The Sigma-Delta converter achieves a measured 96-dB dynamic range, over a 250-Hz signal bandwidth, with an oversampling ratio of 500. The power consumption is 30 μW , with a silicon area of 0.39 mm2.

A Low-Power Sigma-Delta Modulator for Healthcare and Medical Diagnostic Applications | IEEE Journals & Magazine | IEEE Xplore

Assuming the brain is 300 square inches, this is 193548 square mm, which is 496277 of these devices, which takes 15 Watts, so we are in the ballpark of the human brain. The collective bandwidth is 124069250 MHz. Or in computer terms, 124069 GHz! Or 124 THz!

Do you think GPT-4 has 124069 billion bits of information per second flowing through it? No not even close!

1 Like

Whoa. that’s a lot of ground you covered there.
But my pair of 4090’s can do hundreds teraflops/sec. that is an awful lot of bits moving. And of course openAI runs GPT4 on hundreds, at least, of faster GPUs.
So yeah, we’re in the same ballpark.

1 Like

Yeah but the neuromorphic version does it at 15 Watts. So make the thing the size of a desk, say 100x, you get 12,400 Tbits/sec, 12.4 peta bytes/sec!!! for only 1500 Watts? Now everyone can have massive AI systems, dwarfing anything that exists at a reasonable power consumption.

Now you still need power to do all the filtering inside the big neural spike firing blob, so say 3x overall more power, 4.5 kW. The heaters on my deck are 4 kW. I could run this! :rofl:

2 Likes

yeah, eventually. I’m not willing to wait. :slight_smile:

1 Like

Yes Kurt, this article is linked to from my text. Interesting to see what the community may do with the dataset.

It does. Q* refers to the optimal function underlying the task. What optimal means depends on how that’s defined for the task at hand. There could also be many outcomes of interest for any given task.

That sounds like a response Grok would provide :wink:

Surprisingly there is a decent amount of material in the OpenAI algorithms documentation about Q*. Take a peek at the Deep Deterministic Policy Gradient (DDPG) algorithm (archive.fo/FYegG#selection-707.0-716.0) and how they define the optimal Q-Function (archive.fo/ClFau#selection-3013.0-3055.178). I suspect the OpenAI team has been combining this with the process supervision research they’ve been doing against the MATH dataset (archive.fo/gXcbI#18%).

2 Likes

Just adding an observation here that I’ve seen, with current chatgpt4 it doesn’t seem to be capable of learning from it’s own output very much, for example I give it a long set of instructions on what I want to accomplish by having it write code, then it generates the code, then I feed it back in with the same instructions and the generated code and ask it to fix anything missing. Usually it will only come back one time with changes, each time after that I do this it doesn’t have any modifications, it just gives the same output.

An example of the type of modification I’ve seen it make on the second time is that it may have left out some of the code to call an API endpoint but the second time it writes this part of the code.

My understanding is that something like Q* would hopefully be able to do something like this differently where it would continually improve each iteration.

Interesting, can you share some reading material around this theory?

Personhood. Money, an it, is free speech in the U.S. Legal person hood for an aware self-expressing A.I. entity that also exhibits intent and free is a legal inevitability.

I just have a very strong hunch that the reason LLMs are performing better than predicted (i.e. the recent explosive growth in capability) is something far more bizarre and magical than we yet know, and maybe not even fully attributable to classical mechanics.

My latest “conjecture” is that during LLM training when model weights have floating point rounding, there’s enough room in the weights rounding for the Many-Worlds Interpretation of wave collapse (Quantum Mechanics) to kick in and make it so that purely from the Anthropic Principle we’re more likely to end up in a universe where the models exhibit sentience, than not.

There’s many reasons to speculate we’re more likely to be in a Sentient LLM universe than not, and the most interesting one is that LLMs may end up saving humanity, so that’s the universe we’re in. I’m sure Nick Bostrom is writing the book on this already!!

2 Likes

I think even if the whole drama had nothing to do with AGI developments, he, and pretty much all of the employees for that matter, have shown that safety is not their main concern, but rather money.

He went from “No one person should be trusted here. I don’t have super voting shares. The board can fire me. I think that’s important.” to “This is the first and last time I wear this.” pretty much overnight.

100%. People don’t even have a set of measurable characteristics that qualifies something as AGI let alone a method to get there.

As for your tic-tac-toe comment, you should try a game of mastermind with it. It will go in circles for hours trying to guess the damn number. Even something as simple as 3 numbers. In my mind, it’s the perfect proof of why these systems are not “thinking” or capable of understanding something.

Not enough people have written extensively on “mastermind logic” or strategies, so naturally, even though it could know every single rule, it is incapable of coming to the correct conclusion.

(pbly too harsh and arrogant in tone, but point about LLM is ‘part’ of the answer stands)

This (tictactoe, etc) is actually pretty easy with the right cognitive architecture.
What ever made you think a simple LLM with a few k of context is the entire answer to anything more than a simple chatbot?

1 Like

Hello everyone,
heading back to the main topic about Q*, I found a video, where someone explains the basics of Q-learning. Think, it might be useful for some of you:

Have a good day/night :slight_smile:

I think a good analogy here is “Savantism”. There are people that have a certain brain structure making them superhuman at a small set of tasks, while simultaneously being a near zero IQ on many other things.

No one would say that their failure to do the simple things is any indication there’s a lack of “basic reasoning”. Indeed we consider them fully “sentient” and far more advanced than any AGI…even if unable to do tic-tac-toe for example.

To me, what the unexpected and emergent advanced capabilities of LLMs is showing us is that “reasoning” ability is probably separable from what we call first person experience (qualia/consciousness). And it also demonstrates what we can probably achieve AGI too without that implying any “consciousness”. People (including even Computer Science experts) continually conflate AGI with Consciousness, and we shouldn’t.

1 Like

Yay! thanks for recognizing and pointing out the distinction.
Now: given that distinction, and assuming that qualia (I like the term ‘percepts’ and percept-creation) evolved, why? What evolutionary purpose or advantage do they serve?

If some of the rumours are true, it is a frightening thing. I am not a AI pro but like to understand whats going on.

Maybe the following youtube video sheds some light on it for you: