And welcome to the community forum, I’m always happy to see people interested science
Large language models, like GPT-3, are primarily designed to process and generate natural language text, such as articles, essays, and stories. They are not specifically trained or optimized to solve mathematical or physical problems.
large language models are probabilistic in nature and operate by generating likely outputs based on patterns they have observed in the training data. In the case of mathematical and physical problems, there may be only one correct answer, and the likelihood of generating that answer may be very low. This can result in large language models producing incorrect or nonsensical results when attempting to solve complex problems.
large language models excel at natural language, not math and physics.
If you want to use GPT or other LLM’s for complex math and physics, you will have to help the model by telling it the correct answer.
Very true, but the internet is not necessarily wrong, there’s this scientific phenomenon called “wisdom of the crowd” meaning if you increase the number of people guessing then you’ll get closer to the actual result.
In this case we also told GPT that “everything flies forward”, which is generally true, the ballon only goes the other way because it’s filled with helium.
I’ve done a similar experiment with a helium balloon on a train and can confirm that it moves the opposite way of everything else
I have seen a few examples of how to prompt for complex math word problems using GPT on the openai cookbook GitHub, which is a repository of useful tips and tricks for working with GPT models. You can find it here: GitHub - openai/openai-cookbook: Examples and guides for using the OpenAI API. One thing I learned from there is that you need to be very specific and clear when asking GPT to solve math problems, and use proper symbols and units. For example, if you want to calculate the force of gravity between two objects, you could prompt GPT like this:
Calculate the force of gravity between two objects with masses m1 = 10 kg and m2 = 5 kg, separated by a distance of r = 2 m. Use the universal gravitational constant G = 6.67 x 10^-11 N m^2 / kg^2.
One interesting thing to explore that I started playing with is instead of asking these complex questions, you can get GPT to create simulations for your data sets, and then use the specific model to generate the answers. For example, if you have a data set of measurements of the position and velocity of a projectile over time, you can get GPT to fit a quadratic function to the data and then use that function to predict the maximum height, range, and time of flight of the projectile. I have been using the GPT-3 model for this purpose, but you could also try other models like GPT-J or GPT-Neo. This way, you can avoid the problem of unit conversion and loss of decimal places that sometimes occurs in physics based mathematical problems.
It’s definitely true that you can prompt GPT in a different way to produce the more mathematical looking results, but they’re still generated on the basis of being the most probable response, not through physical or mathematical reasoning. In your example you mentioned that:
When you have GPT create simulations for you it’s not doing the actual math, it’s either guessing or writing code that you can run to calculate the answer.
The simulations GPT writes can be hit and miss, I’ve had it create 10 simulations of the collapse of a quantum wave function. I included the exact equations in Tex format.
Only one of these simulations actually gave the correct results, all the other just looked correct at a glance.
Sorta like MMO Pong. But what if the real pong ball was infrared and you needed special goggles to see it? That’s what happens sometimes, the crowd doesn’t know. Especially anything technical or science related. Which is why GPT sucks at math or physics, and needs a legit plugin to even be reasonable in these areas!
I think so, if the neural network structure was improved. Right now it is based on probabilities of words following previous words. So it’s not that smart. But what if the information was a set of facts, represented in a graph, where the edges connect other facts to one another in some logical manner. This would be an example of “reasoning”, maybe?
This is what graph based machine learning is. There are no doubt other powerful forms, but the words cannot simply be probabilities, like it is with most of these big LLM’s, there needs to be some logic.
I did some further digging on the wolfram plugin (the /.well-known directory is publicly available)
So while I’m not able to test the wolfram plugin, I can still see roughly how it works.
It’s really interesting because GPT is actually in charge of formulating the problem in a way that the wolfram API can understand. So i decided to test GPT’s ability to do this with the prompt.
What is the optimal power of a tumble dryer? Let’s think step by step.
In this example GPT correctly identifies that the first step is to identify the physical quantities related to the problem, but fails to do this properly, it identifies the volume of the drum and drying time as being relevant and formulates it’s equation based on that.
Wolfram won’t be able to solve this one correctly since GPT (both 3.5 and 4) formulates the problem using the wrong input parameters.