What is Q*? And when we will hear more?

Rainey107577 · November 23, 2023, 12:42am

The cats out of the bag. Reuters published. Any interpretations? Any knowledge files out there on the subject?

m4callik · November 23, 2023, 12:50am

Definitely makes me question Sam’s motives and puts the recent drama in a different light.

This is moving towards more existential questions, faster than anyone imagined, and I’d rather not have Microsoft, Larry Summers or the ex-CEO of fricking Salesforce making the calls whether or not something is AGI.

It’s shades of ‘repealing Glass-Steagall’ to leave it up to those w/ a literal vested interest in keeping AI commercially-viable to make the call whether AGI has been achieved.

m4callik · November 23, 2023, 12:56am

One does not mistakenly keep board members ‘out of the loop’ re: discovering AGI and possibly the biggest breakthrough in human civilization

Rainey107577 · November 23, 2023, 12:58am

Can of worms for sure… Real time!
I wonder what I will wake up to tomorrow. But the currant chatter feels like ripple echos to me. Probably an announcement on Q* before Christmas.

Rainey107577 · November 23, 2023, 1:06am

Exclusive: OpenAI researchers warned board of AI breakthrough ahead of CEO ouster -sources | Reuters.

The maker of ChatGPT had made progress on Q* (pronounced Q-Star), which some internally believe could be a breakthrough in the startup’s search for superintelligence, also known as artificial general intelligence (AGI), one of the people told Reuters. OpenAI defines AGI as AI systems that are smarter than humans.

qrdl · November 23, 2023, 1:17am

As someone who’s done a fair amount of ML/AI research, I can tell you that it is very very easy to think you’ve discovered a breakthrough.

There is a great deal of cognitive bias in AI, and you have to falsify very aggressively.

I am deeply skeptical.

It’s also worth noting that in the news today we found out that the 86B share-sale is back on. I’m sure this ‘breakthrough’ will get investors quite interested.

qrdl · November 23, 2023, 1:27am

Separately, a person familiar with the matter told The Verge that the board never received a letter about such a breakthrough and that the company’s research progress didn’t play a role in Altman’s sudden firing.

Rainey107577 · November 23, 2023, 1:37am

It wouldn’t be a bad time to start thinking about community AI boards to start the alignment aspects of the transition we face. The last week gives us clues to what we could expect in the future. Uncharted territory.

WeylandLabs · November 23, 2023, 2:37am

Q-learning is an algorithm that helps an agent learn the best actions to take in a given state to maximize a reward.

That’s it pretty much

Browsergpts.com · November 23, 2023, 2:40am

I believe the ongoing discussions are less about AGI itself and more about concerns regarding leadership decisions and safety protocols. AGI has the potential to revolutionize every aspect of society, and it’s crucial that we prepare for its impact across all spheres of humanity. It represents a pivotal key—with one turn, it could unlock tremendous benefits or pose significant risks. Ensuring that robust safety measures are in place is essential.

The leaders in the field, including Sam and other directors, are tasked with navigating this complex landscape. I trust they are doing their utmost to secure a safe transition into this new era. We will reach our goals with AGI, but let’s proceed with the necessary precautions—better safe than sorry in the realm of transformative technologies.

Thiago · November 23, 2023, 3:00am

I did an eval on q-learning “way back” when gpt-4 was released!

github.com/openai/evals

Add q-learn eval

openai:main ← SingularityXStudios:q-learning

opened 01:45AM - 20 Mar 23 UTC

mmtmn

+140 -0

# Thank you for contributing an eval! ♥️ 🚨 Please make sure your PR follows t…hese guidelines, __failure to follow the guidelines below will result in the PR being closed automatically__. Note that even if the criteria are met, that does not guarantee the PR will be merged nor GPT-4 access granted. 🚨 __PLEASE READ THIS__: In order for a PR to be merged, it must fail on GPT-4. We are aware that right now, users do not have access, so you will not be able to tell if the eval fails or not. Please run your eval with GPT-3.5-Turbo, but keep in mind as we run the eval, if GPT-4 gets higher than 90% on the eval, we will likely reject since GPT-4 is already capable of completing the task. We plan to roll out a way for users submitting evals to see the eval performance on GPT-4 soon. Stay tuned! Until then, you will not be able to see the eval performance on GPT-4. We encourage partial PR's with ~5-10 example that we can then run the evals on and share the results with you so you know how your eval does with GPT-4 before writing all 100 examples. ## Eval details 📑 ### Eval name q-learn ### Eval description This eval is about q-learning, a type of reinforcement learning algorithm that aims to find the optimal policy for an agent to take actions in an environment, based on maximizing the expected cumulative reward. It uses a Q-value function to estimate the expected cumulative reward of taking an action in a given state. ### What makes this a useful eval? An evaluation on Q-Learning may be important for GPT because it can help determine the effectiveness of using this algorithm for tasks related to reinforcement learning, such as generating text or completing tasks in virtual environments. By evaluating the performance of Q-Learning on a given task, GPT may be able to determine whether it is an appropriate algorithm to use, or if another reinforcement learning algorithm may be more effective. Additionally, the evaluation may provide insight into how to optimize Q-Learning for GPT's specific use cases. ## Criteria for a good eval ✅ Below are some of the criteria we look for in a good eval. In general, we are seeking cases where the model does not do a good job despite being capable of generating a good response (note that there are some things large language models cannot do, so those would not make good evals). Your eval should be: - [x] Thematically consistent: The eval should be thematically consistent. We'd like to see a number of prompts all demonstrating some particular failure mode. For example, we can create an eval on cases where the model fails to reason about the physical world. - [x] Contains failures where a human can do the task, but either GPT-4 or GPT-3.5-Turbo could not. - [x] Includes good signal around what is the right behavior. This means either a correct answer for `Basic` evals or the `Fact` Model-graded eval, or an exhaustive rubric for evaluating answers for the `Criteria` Model-graded eval. - [x] Include at least 100 high quality examples (it is okay to only contribute 5-10 meaningful examples and have us test them with GPT-4 before adding all 100) If there is anything else that makes your eval worth including, please document it below. Disclaimer: One of the Q-Learning generators produced data with a score of 1.0 on the oaieval.py evaluation, while the other Q-Learning generator produced data with a score of 0.0. I suggest to check them both out. I can remove the generators or improve them to save the files in the right location and so on if requested. ### Unique eval value > Insert what makes your eval high quality that was not mentioned above. (Not required) ## Eval structure 🏗️ Your eval should - [x] Check that your data is in `evals/registry/data/{name}` - [x] Check that your yaml is registered at `evals/registry/evals/{name}.yaml` - [x] Ensure you have the right to use the data you submit via this eval (For now, we will only be approving evals that use one of the existing eval classes. You may still write custom eval classes for your own cases, and we may consider merging them in the future.) ## Final checklist 👀 ### Submission agreement By contributing to Evals, you are agreeing to make your evaluation logic and data under the same MIT license as this repository. You must have adequate rights to upload any data used in an Eval. OpenAI reserves the right to use this data in future service improvements to our product. Contributions to OpenAI Evals will be subject to our usual Usage Policies (https://platform.openai.com/docs/usage-policies). - [x] I agree that my submission will be made available under an MIT license and complies with OpenAI's usage policies. ### Email address validation If your submission is accepted, we will be granting GPT-4 access to a limited number of contributors. Access will be given to the email address associated with the merged pull request. - [x] I acknowledge that GPT-4 access will only be granted, if applicable, to the email address used for my merged pull request. ### Limited availability acknowledgement We know that you might be excited to contribute to OpenAI's mission, help improve our models, and gain access to GPT-4. However, due to the requirements mentioned above and high volume of submissions, we will not be able to accept all submissions and thus not grant everyone who opens a PR GPT-4 access. We know this is disappointing, but we hope to set the right expectation before you open this PR. - [x] I understand that opening a PR, even if it meets the requirements above, does not guarantee the PR will be merged nor GPT-4 access granted. ### Submit eval - [x] I have filled out all required fields in the evals PR form - [x] (Ignore if not submitting code) I have run `pip install pre-commit; pre-commit install` and have verified that `black`, `isort`, and `autoflake` are running when I commit and push Failure to fill out all required fields will result in the PR being closed. ### Eval JSON data Since we are using Git LFS, we are asking eval submitters to add in as many Eval Samples (at least 5) from their contribution here: <details> <summary>View evals in JSON</summary> ### Eval ```jsonl $ oaieval gpt-3.5-turbo q-learning.s1.simple-v0 [2023-03-19 22:27:59,287] [registry.py:145] Loading registry from /home/mmtmn/evals/evals/registry/evals [2023-03-19 22:27:59,310] [registry.py:145] Loading registry from /home/mmtmn/.evals/evals [2023-03-19 22:28:00,164] [oaieval.py:189] Run started: 230320012800Y4AKEVDN [2023-03-19 22:28:00,164] [data.py:78] Fetching q-learn/q_learning_examples.jsonl [2023-03-19 22:28:00,165] [eval.py:30] Evaluating 100 samples [2023-03-19 22:28:00,170] [eval.py:136] Running in threaded mode with 10 threads! 92%|██████████████████████████████████████████████████████████████████▏ | 92/100 [00:09<00:00, 15.46it/s][2023-03-19 22:28:10,217] [record.py:309] Logged 377 rows of events to /tmp/evallogs/230320012800Y4AKEVDN_gpt-3.5-turbo_q-learning.s1.simple-v0.jsonl: insert_time=16.296ms 100%|███████████████████████████████████████████████████████████████████████| 100/100 [00:10<00:00, 9.69it/s] [2023-03-19 22:28:10,494] [record.py:320] Final report: {'accuracy': 0.0, 'f1_score': 0.0}. Logged to /tmp/evallogs/230320012800Y4AKEVDN_gpt-3.5-turbo_q-learning.s1.simple-v0.jsonl [2023-03-19 22:28:10,494] [oaieval.py:220] Final report: [2023-03-19 22:28:10,494] [oaieval.py:222] accuracy: 0.0 [2023-03-19 22:28:10,494] [oaieval.py:222] f1_score: 0.0 [2023-03-19 22:28:10,496] [record.py:309] Logged 23 rows of events to /tmp/evallogs/230320012800Y4AKEVDN_gpt-3.5-turbo_q-learning.s1.simple-v0.jsonl: insert_time=1.967ms ``` </details>

I never had the time to fully finish it and I might’ve got some stuff wrong.

WeylandLabs · November 23, 2023, 3:04am

Was it a basic algo of high school mathematics and better rewards ?

Apparently the pros on Twitter or X has it all figured out.

qrdl · November 23, 2023, 3:04am

https://chat.openai.com/share/a47f380f-b6d1-4885-9287-05d9c8dae114

Some interesting ideas on how to use q-learning to train LLMs.

The first idea matches a bit with the synthetic data comments we are hearing.

Interactive Learning Environment: Q-learning requires an environment where it can interact and receive feedback. For LLMs, this could be a simulated or real-world interface where the model can perform tasks, ask questions, or engage in dialogues and receive rewards based on the quality and relevance of its responses or actions.

jimalbarano · November 23, 2023, 3:32am

I would argue that intelligence is smart enough to not fall for the wiles of short-term goals. With the firm grasp the ChatGPTs had of ethics, I would argue we are in good hands.

Sebb · November 23, 2023, 4:02am

AGI will be achieved in the next 6 - 24 months. It is inevitable and it would be better to prepare for it now than trying to stop it (which is a futile effort) and may mean other less well meaning actors will be in charge of humanities most powerful invention ever to exist, and perhaps something that turns out to be the most most advanced evolutionary species since human beings

jay85 · November 23, 2023, 4:14am

I don’t think that’s true. NLP is widely considered to be the main barrier to achieving AGI. OpenAI’s success in the area caused me and many others to think we could see AGI within a couple of years. I’ve been telling people for months that they should shorten their mental time frames from years to months. I don’t mean that I think AGI will happen that quickly; I just mean that advancements we thought were years away are now happening on an almost weekly basis.

So, if the Reuters article is true, it’s not surprising. If we haven’t had a breakthrough with AGI already, then we almost certainly will soon.

qrdl · November 23, 2023, 4:39am

From deepmind, some cute bot animations and a good visual explanation

It’s robotics, but transformers are the basis of LLMs.

qrdl · November 23, 2023, 6:07am

I think we’ve angered the OpenAI team with our community posts

My goal is to get a q* post ‘community flagged’. That will be a sign!

Heh. Must be tough working at such a core company that could potentially have a very broad impact on humanity. All that kibitizing…

Don’t worry folks, the 86B+ share sale should help a bit.

hollywoodsign · November 23, 2023, 6:47am

The acronym RACE - Real-time Antiquation of Current Ecosystem, meaning everything you make, AI will break is about right here.

Every advancement that OpenAI makes implementation of the current AI out of date. I remember watching Khan Academy describe their education platform saying ‘This AI will watch this AI’ - that’s basically agents, but the way they probably implemented it was probably much different and very expensive to build. Autogen / Assistants made that simple.

The paradigm of building AI applications is different than other tech. Every time you finish building a lunar rocket for $10bn just as you apply the paintwork, there are rockets available in Walmart for $9.99m - but can get to Mars, ( deployable living pods, Sirius XM, leatherette seats and aircon extra.)

Given this reality of building. What is your thoughts on
a) Technical implementation
b) Business strategy

duocnv · November 23, 2023, 6:52am

The name “Q*” sounds like it could be a reference to quantum computing, which is a field of research that has the potential to revolutionize many different industries, including artificial intelligence.

Topic		Replies	Views
Introducing AGI Oracle Custom GPT: A Leap Towards Bridging ANI and AGI GPT builders gpt-4	21	5790	November 27, 2023
Why strawberry is not interesting to me Community chatgpt	85	1653	September 16, 2024
Discussion thread for "Foundational must read GPT/LLM papers" Community gpt-4 , gpt-35-turbo , chatgpt , research	75	10418	September 3, 2024
Emergent Fractal Identity: A Quantum-Inspired Approach to the Ship of Theseus Paradox Community gpt-4 , chatgpt , plugin-development , fine-tuning , weird-science	98	408	March 25, 2025
Can artificial intelligence have a mind like a human? Community chatgpt	49	425	March 12, 2025

What is Q*? And when we will hear more?

Related topics