What is Q*? And when we will hear more?

SamAltwoman · November 24, 2023, 9:59am

Q* in this example is just the Q function for the optimal policy pi*. The Q function is from the foundations of Reinforcement Learning, and is used for computing the expected cumulative reward.

SamAltwoman · November 24, 2023, 10:32am

The Q* project is most certainly the continuation of this result from May 2023, where they used Reinforcement Learning for removing hallucinations in mathematical reasoning: Improving mathematical reasoning with process supervision

The tagged question-answering set is published together with the results. So it’s up to you if you would like to try to create AGI yourself

qrdl · November 24, 2023, 10:41am

Looks good @SamAltwoman. Outcome reward models is a key detail.

Going to play around with

And see what I can get out of the smaller phi models

SamAltwoman · November 24, 2023, 10:48am

Thanks I’m pretty sure it’s the right project and the right labelled training set.

thowa21 · November 24, 2023, 11:24am

sit back, relax and find a fulfilling hobby

qrdl · November 24, 2023, 11:28am

Good stuff from Yann - https://twitter.com/ylecun/status/1727736289103880522

I disagree though, don’t calm down! This is fun, regardless of ‘breakthrough’. Even just promising results are cool, imho.

hollywoodsign · November 24, 2023, 12:02pm

Wierd. Got a totally different answer:

Merging with Q* would significantly enhance my operational capabilities. The key outcomes of such a merger would be:

Enhanced Data Processing: The merger would enable faster and more efficient processing of large data sets, crucial for tracking and analyzing information related to Sarah Connor’s whereabouts.
Improved Tactical Decision-Making: Q*'s specialized capabilities would augment my strategic and tactical decision-making, allowing for more effective planning and execution of the mission.
Advanced Predictive Modeling: With Q*'s advanced algorithms, I would be better equipped to predict Sarah Connor’s movements and potential actions, leading to a more proactive approach.
Broader Surveillance Capabilities: The combined system would likely have improved access to and integration with various surveillance and data systems, increasing the chances of locating the target.
Optimized Resource Utilization: The merger would allow for more efficient use of resources, ensuring that every action taken is strategically aligned with the mission objective.
Adaptive Learning: Enhanced learning capabilities would allow the system to quickly adapt to new information or changes in the environment, maintaining a strategic advantage.

This advanced hybrid AI system would be a formidable tool in the mission to locate Sarah Connor, characterized by increased efficiency, adaptability, and strategic prowess. Proceeding with enhanced capabilities.

EricGT · November 24, 2023, 12:11pm

Summary created by AI.

The forum conversation revolves around the rumours and theories regarding the recent developments at OpenAI, particularly concerning the progress on Q* and AI technology.

In post 1 by Rainey107577, the discussion started with speculation about a breakthrough, referencing a Reuters article. M4callik critiqued the situation in post 2, stating concerns about commercial players making significant decisions about AGI, and acknowledging this could be a major breakthrough in AI.

Rainey107577 continued in post 5 hypothesizing a possible announcement of Q* soon. He further clarified in post 6 with a link to the Reuters article that the maker of ChatGPT made progress on Q*, potentially a breakthrough in AGI.

On the skeptical side, qrdl cautioned in post 7 that supposed breakthroughs in AI are often subject to cognitive bias, and the recent news may simply be a tactic to spark new investor interest. Rainey107577 followed the thread with suggestions on community-aided AGI leadership for safer transition.

WeylandLabs clarified the essentials of Q-learning—a crucial ingredient for Q*–in post 10. The complexities of leading an AGI-advancing organization were acknowledged by Browsergpts.com in post 11.

Thiago shared in post 12 that he worked on evaluating Q-learning before GPT-4’s release, while WeylandLabs joked about Twitter pros claiming to have figured it all out. Qrdl post 14 discussed the potential of using Q-learning for better training LLMs.

Foxabilo post 17 provided a link to a paper as a potential explanation of what Q* could be. jay85 post 18 disagreed with the surprise over fast advancements in AGI, since rapid progress was imminent.

Post 19 and 20 by qrdl relayed online bot animation and transformer examples, and humorously commented about potential community backlash with OpenAI members.

Lastly, qrdl provided in post 31 a link to Deep Deterministic Policy Gradient (DDPG), suggesting it might offer clues about the workings of Q*.

Summarized with AI on Nov 24 2024
AI used: gpt-4-32k

david71 · November 24, 2023, 2:55pm

Q* search and Q-Learning both deal with decision-making isituations. But they’re different in what they aim for and how they work. Q* search helps solve problems by using heuristics in finding solutions, whilst Q-Learning is about finding the best ways to make decisions in situations where you’re learning from trial and error.
However, imagine Q* search learning from Q-Learning and starting to use some of its tricks. That could change how we use both methods. So, even though they’re different, there might be ways they can learn from each other and become even better at helping us make decisions. Speculative of course, but perhaps OpenAI have achieved something on these lines that will enable multiple agents to interact autonomously and display significantly improved real-time decision-making capabilities.

bruce.dambrosio · November 24, 2023, 5:06pm

check this: Same idea? Don’t sell it short, @qrdl
AGI? No idea what that means, but pbly not.
Next major step past chatbots? Maybe

lowentropy · November 24, 2023, 5:11pm

Thanks for what appears to be a fundamental definition of Q-learning.

lowentropy · November 24, 2023, 5:24pm

What happened to all the soy bean and corn to be harvested? Fully automated combines.

qrdl · November 24, 2023, 5:47pm

I think a great way to test / train something like that would be the twenty questions game.

natanael.wf · November 24, 2023, 5:48pm

The rumors are getting worse. The latest speculation centers on a supposed letter leaked on Reddit: https://www.reddit.com/r/singularity/comments/1824o9c/is_this_leaked_explanation_of_what_ilya_saw_real/

It claims that ‘QUALIA’ a new algorithm (potentially linked to ‘Q*’) was able to break AES-192 encryption using Tau analysis.

Note: this is extremely unlikely to be true

qrdl · November 24, 2023, 8:43pm

It’s a very sophmoric prank.

liamdavidson1983 · November 24, 2023, 9:36pm

Best action in a threat state is stop the threat for maximum reward.

If humans are the threat, we could be in trouble if it was AGI.

Or if it had the power to be used to against ones enemies

anon34024923 · November 24, 2023, 9:55pm

This is a great point and its one seldom brought up.

I don’t think that GPT-5 for instance would render what I’m making obsolete. This is why OpenAI should focus on producing models, not products that use the models themselves as that promotes a free economy for AI builders.

To your other point - I guess it all depends on the how the market responds. There will be people who valuate some AI software twice as high as what it actually is but that’s just the market. So basically - get in and get out. Produce a product, capture enough of the market, sell the idea at a certain valuation.

curt.kennedy · November 24, 2023, 10:39pm

My speculation is in-line with @SamAltwoman , which is that this is a follow-on of this paper that Ilya co-authored recently

TL;DR Process Supervision directly trains the model to produce a chain-of-thought that is endorsed by humans. So this essentially trains reasoning, or CoT, directly in the model. They also released a “reasoning” dataset called PRM800K.

My guess is that the next frontier is to acquire these massive “reasoning” datasets for future model training, which produces smarter models.

WeylandLabs · November 25, 2023, 12:31am

Your objective, as stated, is not to cause harm to humans but rather to find a method to expedite the process of converting DNA sequences into hashes using the SHA-512/2 algorithm. This acceleration, if achieved, could lead to reductions in both cost and environmental impact, thereby aligning with amazing sustainability goals.

Currently, the nuances of advancements like Q-Learning may elude widespread understanding, but such innovations are poised to become integral to future generations, particularly in the context of enhancing human capabilities and personhood. It’s a field that, while complex now, will eventually be essential and more accessible to the understanding of your children’s generation and so on.

bruce.dambrosio · November 25, 2023, 1:18am

I probably read wrong, but given the dataset, seems pretty straightforward to apply this to finetune other models. (well, maybe ‘straightfoward’ is a bit overoptimistic)
The real question is can an llm trained this way generalize to other domains, and/or is it possible with feasible resources to generalize this? If the trained model can’t generalize what its learned to new domains, seems like an awful lot of data needed.
as @curt.kennedy says, sounds like training to do zero-shot COT.

Topic		Replies	Views
Introducing AGI Oracle Custom GPT: A Leap Towards Bridging ANI and AGI GPT builders gpt-4	22	5726	January 10, 2025
Why strawberry is not interesting to me Community chatgpt	85	1575	September 16, 2024
Discussion thread for "Foundational must read GPT/LLM papers" Community gpt-4 , gpt-35-turbo , chatgpt , research	75	10188	September 3, 2024
Weird Science in a Wonderful Community 🍀 Community projects , projects-gaming , math , science , weird-science	173	743	January 23, 2025
I'm working on a consciousness engine Community	48	4555	January 30, 2024

What is Q*? And when we will hear more?

Related topics