Is this too much information for GPT4?

I made some tweaks to my graph of thoughts project and it improves the r2_score much faster now, but the issue I’m running into now is that it seems to misread the insights it generates and I put in the prompt to track the chain of thoughts:

Insight: Replacing LinearRegression with RandomForestRegressor in base.py causes the r2_score to go from 0.575 to 0.804 i\
n base_n0.py
Insight: Replacing LinearRegression with GradientBoostingRegressor in base.py causes the r2_score to go from 0.575 to 0.7\
76 in base_n1.py
Insight: Replacing LinearRegression with AdaBoostRegressor using DecisionTreeRegressor as a base estimator in base.py cau\
ses the r2_score to go from 0.575 to 0.386 in base_n2.py
Insight: Increasing the number of estimators to 200 and setting max_depth to 15 in RandomForestRegressor in base_n0.py ca\
uses the r2_score to go from 0.804 to 0.802 in base_n0_n0.py
Insight: Changing the number of estimators to 150 and max_depth to 10 in RandomForestRegressor in base_n0.py causes the r\
2_score to go from 0.804 to 0.775 in base_n0_n1.py
Insight: Changing min_samples_split to 5 and min_samples_leaf to 2 in RandomForestRegressor in base_n0.py causes the r2_s\
core to go from 0.804 to 0.806 in base_n0_n2.py
Insight: Replacing RandomForestRegressor with GradientBoostingRegressor and adjusting hyperparameters (n_estimators=200, \
min_samples_split=5, min_samples_leaf=2, max_depth=5) in base_n0_n2.py causes the r2_score to go from 0.806 to 0.83 in ba\
se_n0_n2_n0.py

Eg, when asked to reason about the insights above it said:

Reasoning:

  • Replacing LinearRegression with RandomForestRegressor significantly improved the r2_score.
  • GradientBoostingRegressor also improved the r2_score compared to LinearRegression but not as much as RandomForestRegressor.
  • AdaBoostRegressor with DecisionTreeRegressor as a base estimator performed worse than LinearRegression.
  • Increasing the number of estimators and max_depth in RandomForestRegressor didn’t significantly improve the r2_score.
  • Changing min_samples_split and min_samples_leaf in RandomForestRegressor slightly improved the r2_score.
  • Replacing RandomForestRegressor with GradientBoostingRegressor and adjusting hyperparameters improved the r2_score further.

Based on these insights, we can try using the GradientBoostingRegressor with the same hyperparameters as RandomForestRegressor, which gave the best r2_score so far (0.806).

That last datum is wrong, 0.806, it should be 0.83

Also, the statement “GradientBoostingRegressor also improved the r2_score compared to LinearRegression but not as much as RandomForestRegressor” is wrong given the results above, though I sort of see how it went wrong there.

It’s important that it gets data like that right, otherwise its reasoning can easily go off kilter. I suppose my expectations are too high, but I just wanted to make sure this is what other folks are seeing.

1 Like

Sometimes I want to get inside the flow of thought within a single turn and say - wait a minute. maybe using streaming? But I don’t know a way to insert more tokens into context within a single generate