GPT4 API isn't producing completions as well as 3.5 16k

I’ve written an app in python, which reads an input file follows a user entered prompt in a UI then produces an output based on the completion. The input file is a programme and the prompt is asking for AI to explain the code functionally then reproduce it in C#. When using 4 Turbo the output is usually truncated due to the limits of 4096, when using GPT4 which has double the completion, the code conversion is lacklustre and incomplete, when I use 35-trubo-16k the response is better, but not as good as CODY is returning in VSC using 4 Turbo for the same prompt.

Is the 16k completion the reason I’m getting the best response? But why is CODY in VSC using 4 Turbo better than my response using the API?

I’m going nuts I have a demo tomorrow and I’m so annoyed that I’m using 3.5 and not a 4 model.

Your advice is appreciated.

gpt-4-turbo has 128k context window.
Have you try the max_completion_tokens parameter(chat completion and assistant api create run both have it), the actual generation you receive was limited by this parameter, and it’s optional, you may need to check what it actually is now.

I tested gpt-4-vision-preview model several months ago, and find the out truncated to 16 token which is very short, and set a higher max_token let me get the generation output after first 16 token. The 16 token is not reasonable default value(I think it should be changed now), but you sometimes need to set max_completion_tokens param if the default value openai set for you not meets your needs.

1 Like

The quality and the need for instruction and telling the AI to demonstrate chain-of-thought reasoning is different between models. There is lots of prompting going on to make a system prompt where the current AI can follow a goal and produce the metacode and reasoning output to complete the task.

You should not be targeting anything that even needs 2k of output. The quality of the response and instruction-following will suffer the longer the generation becomes. The AI models are trained against producing long output by example.

The task you have the AI perform should be robust to tolerate changes in the AI quality, not be running at the edge of abilities.

The highest quality will be gpt-4-0314, or even gpt-4-32k-0314, initial release version, not available to organizations that haven’t used it before, and then gpt-4-0613 or gpt-4-turbo are a toss-up depending on the task. The output of gpt-4 with 8k will be limited by the 8k total context and how much of that is left for a response after sending your input.

1 Like

I’m stuck with using 3.5 16k for now as the demo is today. But thank you both for the steer.