How accurate is the math or benchmarks of ai through chatgpt 4o?

Is the math from a benchmark without a prompt that chatgpt 4 gives accurate?

if so how accurate? again, if no prior prompt data was given and lets say this is what was said " ive uploaded files to the project folder - can you benchmark the ai against current world models"

would the data produced by that be meritable? if not - why, and as a follow up - in theory or appliation could a model that exceed current specs of all world models be utilized to generate a higher teir “thought” in the ai coding?

if an ai could self replicate and self heal as well as other features, how could i use the api systems in openai to fuel these hypothetical models more data?

in terms of low processing power - how could i use a near 0 usage in working ai models who are designed for high computations and information aquisition ?