FYI
There are a few topics along these same lines as this topic posted on this forum.
To gain more facts made this suggestion/feedback for the “unofficial OpenAI status dashboard” site
if every week several different prompts that have different types of results and/or focus were given to the models and then recorded, then there would be a factual record of the changes.
The suggestion provided did not offer any specific prompts, but several replies in this thread highlight prompts that are meaningful for tracking purposes.