Feature Request: Dated snapshots for GPT-5 (gpt-5-main) or pinning for gpt-5-chat-latest

Hi OpenAI team,

For open research, reproducibility, and transparent evaluation, I’d like to request either of the following:

  1. Publish gpt-5-main as a dated, snapshot model in the API (e.g., gpt-5-main-2025-08-07),
    or

  2. Allow saving/pinning dated snapshots of gpt-5-chat-latest (e.g., gpt-5-chat-2025-08-07).

Context & rationale

  • (1) Model mapping we’re relying on. The GPT-5 system card positions gpt-5-main as the successor to GPT-4o (the fast/throughput model), while the “thinking” models are labeled separately. In the API docs, the non-reasoning ChatGPT model is exposed as gpt-5-chat-latest, which aligns with that “main/fast” lineage. (See: “GPT-5 System Card”.)

  • (2) What we need for open academic work. For interpretability and reproducibility, researchers ideally want full chain-of-thought. We understand that, for safety, the API does not return raw CoT, and instead offers reasoning summaries when explicitly enabled in Responses API. We accept that constraint and therefore want to evaluate models with the thinking process disabled/omitted—but we need a stable, dated target to make results comparable over time. (See: “Reasoning models — OpenAI API docs”)

  • (3) Why summaries alone are not enough for evals. When the intermediate reasoning between a user prompt and an assistant answer is hidden or summarized, it becomes harder to assess how the model arrived at a result (e.g., whether it relied on brittle heuristics, tool sequences, or implicit assumptions). Even OpenAI’s system cards describe a “Chain-of-Thought summarizer” for safety; this is useful, but it’s not an operational replacement for full, step-by-step traces in many research settings. Having a pinned non-reasoning snapshot at least lets us evaluate outputs without the confound of evolving internal reasoning behavior.

  • (4) About prompts injected before the first user message. We’re fine with a fixed, model-level system prompt—i.e., system-level instructions inserted before the first user message—remaining non-public. For reproducible research we don’t need those contents revealed—we just need a dated snapshot whose behavior won’t change out from under published results.

OpenAI already provides dated snapshots for other models (e.g., GPT-4o and friends) so teams can lock a version for stable behavior. Bringing the same to GPT-5’s fast/non-reasoning path—either as gpt-5-main-YYYY-MM-DD or dated gpt-5-chat-*—would directly enable open, reproducible research and clearer baselines for the community.

If others have perspectives—or if there’s official guidance we’ve missed—please share so we can converge on a community-useful approach.

Thanks you.

1 Like

I second this.

For my usecase, after lot’s of trial and error gpt-5-chat-latest works the best and couldn’t get gpt-5 get close to the desired output that the chat model nailed for me, however there is no way to pin it on an exact snapshot so it’s really risky to use it in production.

@Martin_Fulop
when using gpt-5-chat-latest you get no reasoning.
have you tried using gpt-4.1? it does have snapshotting and i found it to be better than gpt-5-chat-latest in my domain of tasks.