The reason you’re encountering length as finish_reason is because your input is large enough to consume most of the model’s context length, which then reflects in generated response truncated.
The max_tokens is checked before sampling and in your case: input + max_tokens > context length
Hence it results in 400 error.
Also, in case of chat completion all the context length apart from the input is set to max_token automatically.