LLMs as basis for general problem-solving

cirrus.shakeri · May 20, 2023, 10:44pm

This is a very very interesting paper:

Tree of Thoughts: Deliberate Problem Solving with Large Language Models

… we return to the origins of artificial intelligence (and cognitive science), drawing inspiration from the planning processes explored by Newell, Shaw, and Simon … characterized problem solving as search through a combinatorial problem space, represented as a tree. We thus propose the Tree of Thoughts (ToT) framework for general problem solving with language models.

It seems that the real fun is just starting!

qrdl · May 20, 2023, 11:59pm

It was actually posted on the Foundational must read GPT/LLM papers - #15 by qrdl thread, but I agree it’s a very interesting paper!

I found Karpathy’s shade throwing around this sort of thing very unfortunate.

I mean I could just say:

Overheard: “People who know nothing about pure mathematics are now paradoxically advantaged in machine learning because they don’t immediately reach for overly sophisticated math and spend a lot more time hacking ML algs” When hacking ML algs feels below your dignity but it works :’|

The OAI folks post a lot of … stuff … on twitter. Maybe they could dial it back a bit. We have enough noise going around already.

qrdl · May 21, 2023, 9:55pm

Did this based on the paper

github.com

qrdlgit/graph-of-thoughts/blob/master/README.md

# graph-of-thoughts

The following is based on a paper recently hitting arxiv - "Tree of Thoughts" https://arxiv.org/abs/2305.10601

The concept is depth/breadth first search on a tree of chain of thoughts using LLMs.

For this 'graph of thoughts' approach, it is a bit different version of the paper. It is being used to autonomously improve an ML program.

It creates 3 alternative paths, and then chooses the best one and tries to improve that.  It loops recursively until ctrl-C.

It starts with a basic sklearn dataset and code and then we ask GTP4 to improve its r2_score.  The starting point was the following code, base.py in the repo.  

data.pkl is the california housing dataset, stored as 'data.pkl' so as not to clue GPT4 in as to what the optimal alg should be from its training data.

```
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score
import pandas as pd

This file has been truncated. show original

bruce.dambrosio · May 21, 2023, 11:38pm

Thank goodness we live in a world where there are those who feel free to ‘try’, ignoring the expert’s “you can’t do that” filter.

Aside - thinking more and more about the ‘Tree of Thought’ paper, the less I think about its contribution to advancing our ability to use the tool (llm).

the hard problems of search are alternative generation and evaluation. The paper doesn’t provide any general methods for either. The key in alternative generation is coming up with an appropriate step size, and llms don’t seem (imho) very good at that, or in general reasoning about their own capabilities or limitations.

jwatte · May 22, 2023, 3:24am

LLMs don’t reason at all in the sense of trying a thing, working it out, and then seeing if it works and back-tracking if it doesn’t.
LLMs pattern match.
They pattern match against a very huge corpus of things humans have put into words in the past, which is quite likely to contain some patterns that are quite applicable to what you want to do, but they don’t ever try-and-repeat. They don’t have the internal representation to run a “thought experiment” on their own inference output, although you maybe could feed the result back and infer something else about it …
(Also, “inference” here is used in the Markov-like model-generation sense, not the logical/mathematical sense.)

qrdl · May 22, 2023, 3:30am

LLMs reason, just have limited capability to do so. A light switch can reason, but only in a very limited (boolean) way.

Do LLMs reason in the same way people do? Probably not, but really only humans can reason like humans by definition.

bruce.dambrosio · May 22, 2023, 3:34am

I respectfully disagree.
I would propose that perhaps a more interesting question to ask than ‘what is it doing’ might be ‘how can we humans model and think about what it is doing in ways that enable us to manipulate it usefully?’

to paraphrase Sutskever (sp?)
‘people think predicting the next token can’t possibly be enough. But to predict the next token well enough you have to build an internal model’ (my emphasis, not his).

another example: what is ‘reason step by step’ style prompting? Even if you disagree with my paraphrase above, it can at least be usefully thought of as surfacing subsymbolic reasoning, even if from a mechanistic view that might make no sense.

One last thought: If it is Markov-model like, it is a quite complex multi-level, probably non-hierarchical, Markov model, so I’d suggest there are multiple abstraction scales at which one can describe the ‘model’ being captured. ‘Just’ token sequences? perhaps.

qrdl · May 22, 2023, 3:39am

One way to look at this, is imagine an alien came to earth in a cool spaceship, whose physiological makeup was entirely different than humans.

Would you say they can reason because they had a cool spaceship?

What if they did so in a manner totally unlike what humans do?

That all said, I would argue an average person of average IQ could outthink GPT4, if the stochastic parrot part of GPT4 wasn’t an advantage.

But go below average IQ and it becomes iffy. And on topics that GPT4 knows well, it has a huge advantage.

bruce.dambrosio · May 22, 2023, 4:01am

@jwatte , just saw something I didn’t notice before in your post:
but they don’t ever try-and-repeat

well, yes. as you point out, you’d have to put some wrapper code around them to do that. but then you say:

They don’t have the internal representation to run a “thought experiment”

That is intriguing. How would I test this hypothesis? Is it fair to try to write a prompt that suggests it do so before providing an answer? If so, would failure of attempts to write such prompts be proof of lack of capability? Would ‘success’, such as text output indicating it had done so be proof of ‘thought experiment’ capability, or just more pattern matching?

Poppers theories of scientific hypotheses as necessarily falsifiable have been discredited, I’m told, but still, seem useful in practice.

Regardless of all the above, I’m genuinely intrigued by the idea of running thought experiments in a single turn. I’ll try to find some time to play with it. Thanks!

cirrus.shakeri · May 22, 2023, 3:50pm

For what is worth and to ground the ideas proposed in the paper and discussed here:

One area of application for tree of thought is exploring different scenarios for business growth in the enterprise domain. This is an ill-structured problem with many many moving pieces but hugely valuable for a large company. Most of what companies do in this area is based on guesswork. Some can afford to hire McKinsey to do the guesswork for them!

bruce.dambrosio · May 22, 2023, 4:05pm

A key problem here is to understand the appropriate level of human-machine interaction, yes?
Seems like there are four basic actions in search (ignoring, for the moment, the all-crucial problem formulation step)

move set generation
move selection
move application
state evaluation

Seems like a talented business analyst could implement TOT manually, doing the move selection and evaluation either manually and/or with AI support. Key processes of move idea generation and move application could then be interactive, with AI leading move-set-step-generation, and hopefully doing most of the heavy lifting in move application

Or were you thinking of a completely automated process?

cirrus.shakeri · May 22, 2023, 4:54pm

Yes, the right level of human-machine interaction is the key because the decisions are very expensive. Therefore, the enterprise people signing the checks need to be convinced about the solutions formulated with the help of an AI. They get convinced only by being in the loop (or having the people they trust in the loop).

A completely automated process won’t sell even if it is technologically feasible.

jwatte · May 22, 2023, 10:27pm

There’s not testing needed, because this follows from first principles: The construction of the model simply does not allow it to back up, nor to predict further than one token ahead. Similarly, the inference model is only forward through the layers, it doesn’t “run a loop.”

You could make arguments about the output tokens being part of the input, and this causes a loop of some sort, but it’s still a forward-only loop, which is computationally less powerful than a full loop with conditionals. I would argue that this doesn’t rise to the bar of “reasoning,” but I’m not sure that there exists a well-defined test for what “reasoning” really means, so if you were to attack that argument, doing it by definition would be possible.

I’ve also heard an argument that each “layer” is its own step in a reasoning chain, but that’s still just one “thread” and the model keeps no other hypotheses; it just arrives at token probabilities and picks one, so I’m wholly unconvinced by that argument

qrdl · May 22, 2023, 10:38pm

Not really true, because the chain of thought and self-refinement, but even if it were true I don’t think ‘backup’ is a part of the dictionary definition of reasoning.

We’re probably all saying the same thing, tbh, just getting caught up in semantics.

I think I’m going to start using the term ‘infers’ instead of reason so I can avoid these discussions.

bruce.dambrosio · May 22, 2023, 10:52pm

First principles? You mean I could give your argument to a theorem-prover and it would confirm it? Again, I respectfully disagree. Your argument is much too abstract for simple prima-facie confirmation.

Inference is strictly bottom up left to right across every unit in every layers? Are you sure? no right-to-left connections between any units in any layer anywhere? I’ve never seen a multi-layer model built that way. Even a simple convolutional layer in static image recognition involves bi-directional aggregation over the layer below it, right?

jwatte · May 23, 2023, 10:22pm

A classic convnet is strictly forward inferred. The pool layers are still a forward operation.

There is an argument that these models can speculate, at a depth no deeper than the number of layers, and use the outcome of some limited number of speculation branches to select one of a limited number of separate inferences, and I’m somewhat receptive to this argument. I think the stronger part of my argument is that the models cannot “back up” – they still speculate one token at a time.

When it comes to feedback, there’s clearly read-only feedback from what it inferred, as any previous output is available as input. (This is one of the things that are different in transformers and other recurrent models compared to forward convnets.) As far as I undertand it, this form of read-only feedback is equivalent to some amount of unrolling into a “fixed function” forward only model, which ends up equivalent to the multiple-branches-selection multi-layer implementation I suggested above. Maybe the Google 540 billion parameter model has enough of this to functionally qualify? I don’t have a good intuition for whether the depth needed is additive, or exponential…

That being said, the actual exhibited behavior of the current crop of models is not that of “reasoning” as far as I can tell; e g, inference workloads that require multi-step hypothesis testing consistently fail.

bruce.dambrosio · May 23, 2023, 11:10pm

OK, I’m starting to understand your argument. But still, why, do you think, multi-step hypothesis testing fails? If we make the hypothesis and the test result explicit, they will recycle into input, no? Is the context window too small? yes, they ‘speculate’ one token at a time, but that ‘speculation’ is based on the entire context window, which as I understand is a sliding window over the initial input and all previously generated tokens. What am I missing?

qrdl · May 24, 2023, 9:04am

There wasn’t anything new in the Kaparthy talk that I saw, but it was interesting some of the things he decided to focus on.

I don’t think we can extrapolate too much from it, but it makes me wonder if this is what openai is thinking about in terms of intelligence expansion. One advantage they have is access to internals like activations and probabilities. Need more research around that with respect to CoT stuff

jwatte · May 24, 2023, 6:36pm

The models don’t go back to “recycle” anything. The move forward and output a token.
I’ve never seen the model output “and then we do X, and then we do Y … no wait that doesn’t work, let’s do W instead of X, and then we do Z …”
And a reasoning model would do that before it even outputs X (like a typical reasoning agent composing output instructions would.)

bruce.dambrosio · May 24, 2023, 6:53pm

Respectfully disagree. My understanding is that is exactly how LLMs work. Yes, they generate one token at a time, but each token is generated from a context that includes the original input plus all previously generated tokens.
Assuming context is large enough, of course.

Topic		Replies	Views
Episodic and declarative memory should probably be separate in AGI Community	12	1558	January 12, 2022
Meta Learning with External Knowledge Graphs: Pro Tip Prompting	7	1270	January 3, 2024
Text "sculpting" for long-form completions Prompting	3	1634	June 14, 2021
A universal neural network? Community	25	2034	October 3, 2021
Is API completion model davinci-002 and its 16385 context useful for ANYTHING? API davinci	1	2630	December 14, 2023

LLMs as basis for general problem-solving

Related topics