What if it’s Q from Star Trek? Q (Star Trek) - Wikipedia
It was unrestricted superintelligent type that almost destroid, tested, made fun off and educated human race.
What if it’s Q from Star Trek? Q (Star Trek) - Wikipedia
It was unrestricted superintelligent type that almost destroid, tested, made fun off and educated human race.
I love the guy with the headset in this picture. #woops
My best guess is that this is something to improve the logical and mathematical reasoning in the models based on Q learning
The “star” could imply some relation to the A* algorithm as previously mentioned.
It would be really great if someone told OpenAI that posting secrets in public can end up sideways
I mean, there’s so much confusion on what AGI, consciousness, and all other concepts that get bothered when stuff like this gets discussed. I mean, maybe Wittgenstein was right on “Whereof one can not speak [i.e. sensibly], thereof one must be silent”. Announcements like these in Reuters though mostly make me think about whether any research should follow corporate logic, as other people here said as well. (not just ML research)
In general, I think with GPT4 there’s already plenty to study about emergent abilities / in-context learning without bothering new secretive projects. I have always imagined that generalised super-human performance (again, with Wittgenstein, whatever that means, so I guess oops) would be achieved with some flavour of neuro-symbolic integration and maybe a very simple architecture.
“Q” is a type of reinforcement-learning.
A* is an algorithm that is used in game development, to find the shortes/least costly path between a character and its goal/position.
Add “Q” and “A*” and you get “Q*”.
This could mean something.
The joke there was conspiracy rubes finding random things in ML literature to latch on to. PPO is the ML algo that OpenAI already fine tunes models with in production.
Troll apparel to wear when visiting OpenAI
https://spinningup.openai.com/en/latest/algorithms/ddpg.html
Deep Deterministic Policy Gradient (DDPG) is an algorithm which concurrently learns a Q-function and a policy. It uses off-policy data and the Bellman equation to learn the Q-function, and uses the Q-function to learn the policy.
This approach is closely connected to Q-learning, and is motivated the same way: if you know the optimal action-value function , then in any given state, the optimal action can be found by solving
Great find mate, thanks for sharing!
I think you found it
All credit to https://twitter.com/npew/status/1727595470795792489
I would ask folks at OpenAI team to dial down the public mocking if you’re seeing results with q*, even if they aren’t breakthroughs (which very likely aren’t)
It’s bad enough you have stopped publishing while continuing to take advantage of open research.
I’ve been worried about this for a while. The fractures in humanity are a huge weakness. I’m less worried about autonomous AI than I am madmen abusing it. This incredible powerful tool changes the game. AI is not dangerous to people, other people are. I started a youtube channel on this very subject. It’s less than 2 week old, so it’s tough getting the word out.
Can you send me your YouTube channel, please? I’m really curious and would love to check it out!
They will not let me post a lonk. I just uploaded into my profile. It ahows up witha search for Rouse Nexus- The AI Metaphysic. Let me know what you think.
You have no clue what you are talking about. LLMs doing math correctly is a huge breakthrough, there is not a single model doing this reliable.
A model with can do math can break down a lot of problems more rationally and needs the ability to plan where its going. If would also enable the model to generalize and not just repeat what’s in the training data.
This is the first step to make agents work on way more complex tasks. The model having the ability to evaluate the reward of different plans to achieve the task and then execute on them is what held Auto-gpt back.
Lets try to please keep the text civil when you send it. I get the passions we all have and understand completely the reaction of seeing counter arguments. I ask that we check the text and edit our words accordingly to remain civil and expansive rather than retractive. Please and thanks either way.
I was being sarcastic, calm down Phil. lol
Q* has leaned to code in DNA The most efficient and dense programming language is actually the one in every living creature. Q* can now be used to code and store data in organic materials. No machine or electricity required.
Do you know how “coding in DNA” is actually done?
I hope Q* LLM gets applied to ChatGPT sometime!