Create eval run via Node SDK fails due to incorrect property name: max_completion_tokens (incorrect) vs max_completions_tokens (correct)

Attempting to create an eval run via API using the Node SDK fails with an error message:

Unknown parameter: 'data_source.sampling_params.max_completion_tokens'. Did you mean 'max_completions_tokens'?

The Node SDK defines max_completion_tokens (incorrect) but the OpenAPI spec defines max_completions_tokens (correct):

github:openai/openai-node/blob/7a97bfcf1ea071bc605c654161cc0e4867d22960/src/resources/evals/runs/runs.ts#L2657

export interface SamplingParams {
  /**
   * The maximum number of tokens in the generated output.
   */
  max_completion_tokens?: number;

OpenAI spec was “manually updated” ~7 months ago, but also confirmed the downloadable openapi.documented.yml (see openai-openapi/README.md) has the same property name:
github:openai/openai-openapi/blob/manual_spec/openapi.yaml#L4003`

sampling_params:
  type: object
  properties:
  ...
    max_completion_tokens:
      type: integer
      description: The maximum number of tokens in the generated output.

Can workaround this with a compiler directive:

// @ts-expect-error - openai-node 6.8.1 has incorrect property name 'max_completion_tokens'
max_completions_tokens: 1000,

Since these SDKs appear to be generated from an OpenAPI spec, it is likely that all of them are incorrect, e.g. Python SDK also defines max_completion_tokens (incorrect):

github:openai/openai-python/blob/650be393dedf2a4550092817c2b82c1d04d6e9dc/src/openai/types/evals/run_create_params.py#L255

class DataSourceCreateEvalResponsesRunDataSourceSamplingParams(TypedDict, total=False):
    max_completion_tokens: int
    """The maximum number of tokens in the generated output."""

(Sorry I was not allowed to use proper links)

3 Likes

Welcome to the developer community, @todd-zarla

Thanks for taking the time to flag this, passed this along to the team.

1 Like

I think you are getting an IDE suggestion that is not correct.

All I can say, also, is that the API was made nutty.

It seems that a third parameter “max_completions_tokens” was introduced for graders. Perhaps “plural” because the models and endpoints can encompass chat completions, responses, and more.

Use this YAML linked from the SDKs with the specification they are matching, for an up-to-date spec that should be what the API SDK is built against - and what the “truth” of the API validates against if you are passing calls without SDK interference.

You will find the schema under “GraderScoreModel:” has “max_completions_tokens”. That is only used for graders and fine-tuning.

Wind back two months before (and download another 2MB, optionally flatten), and there was no validation of sampling_parameters for graders. Only the (wrong?) examples of “run” in the specification.

The node commit for src/resources/graders/grader-models.ts on Sept 17 likewise adds the plural version only there.

In the OpenAPI specification:

A CreateEvalRunRequest, with "data_source", that after several levels has CompletionsRunDataSource (object describing a model sampling configuration).

I flattened and collapsed the POST request tree so you can see where max_completion_tokens (singular) is still the “sampling_params” there.

The SDK concurs with the API spec.

However there is disagreement in the manually-created “docs” that are included in the spec. For example, the response object returned with echoed values (which is just text for population of the API reference page):

That plural version disagrees with the 201 response shape for CreateEvalCompletionsRunDataSource right before.

So: OpenAI has to establish their “truth” and stick to it. Not confuse the AIs that are doing their jobs with bad examples.