About chat-completion vs. completions speed + production token burning + ChatGPT on Davinci 002?

Hi everyone, new guy here. Just started playing with the API today.
I’m just trying different environements: Python, pure cURL, PHP (through cURL)…
I’ve made a few tests with Completions, Chat-Completions and Streams. Started with Completions and all-good. Nices timmings, nice behaviour but “legacy”.
And that’s something worries me about: “legacy” means, deprecated in a short time term? Rewritting/refactoring all the nowadays code?
Cause if it’s a YES, and then you recommend to go by chat-completion then we’ve a REAL ISSUE here: this last one take A LIFE to respond.
I’ve seen that everyone is complaining here, so I know I’m not alone. But first try today, first leave ASAP. It’s literally unusable. Period.

So, can we stay on completions pure? I just need a ChatGPT behaviour on “please, give me that in this format” and that’s it. So 3.5-turbo-INSTRUCT is a best of its kind. It’s OK to go?

Then I’ve tried on ChatGPT to capture traffic through Chrome developer console. And my surprise to see on the Payloads that engine IS NOT GPT 3.5 even it’s the one selected adobe! But Davinci-002 (text-davinci-002-render-sha)? If this is really it, then it works like charm and I think it’s a good to go through the API as it’s fast, reliable and cheaper (or at least ~= as 3.5-turbo). Is that ok?

Finally, about burning tokens on developing. I undesrtand that a “free endpoint” to test, doesn’t makes sense at all (per definition). But it’s a pitty to having to buy some credits to develop all the logics and waste tokens until everything runs smooth and as expected.
I know someone can say “try first on ChatGPT’s website”, but I’m starting to think it’s not exactly the same behaviour, so yes or yes, you’ve to try and develop it on real API playground, so in the end, token burning.

So what on those 3 points?

Thanks everyone. Glad to be here. Nice times ahead :slight_smile:

Hi and welcome to the Developer Forum!

Can you please give details of your environment please:

  • what is your server OS.
  • how much RAM does it have
  • how many cpu’s does it have
  • how much drive space is there
  • what is the speed of the network connection
  • is the machine a VM or standalone
  • what version of python is installed
  • what version of Node.JS
  • what version of PHP
  • what version of OpenAI API in installed
  • code snippets of API calls
  • code snippets of prompt setup and API setup
  • what are your monthly spend limits
  • what are your rate limits
  • which models are you calling
  • is your account pre pay or pay as you go
  • what firewall system is enabled if any

You mention that you used CURL for testing, what did those curl calls look like and what is your curl version?

There is no “free endpoint”. You might have free API trial credit on a newly-created account though.

Ignore the model shown in ChatGPT network connections. ChatGPT doesn’t connect directly with an API model, it has its own server backend for receiving user input messages that are then processed internally and sent to an AI model, gpt-3.5-turbo or a very close likeness.

If you are making a chatbot like ChatGPT, you will want to use the “chat completions” endpoint, gpt-3.5-turbo, and the messages format this requires. You will also have to manage a conversation history.

If you are just interested in 0-shot answering of questions or AI text processing tasks, you can use gpt-3.5-turbo-instruct through the completion endpoint.


Other models are “advanced”, and you will need familiarity with their particular way of operating. davinci-002, for example, is a base completion engine that does not have training on either chat or following instructions, it just generates completion text.

Good morning, thanks for your fast replies!
So, let me bring some of the answers you’ve asked:

  • Windows Server 2019 Standard
  • Intel Xeon Gold 6230 @ 2.1Ghz (2 procs) – Virtual Server on VPC provider.
  • 32 GB RAM
  • x64 OS
  • It’s a development machine, so nowadays I’m only on 20GB free space aprox. More than enough through 2 years long.
  • Between machines, we move through 10k LAN. And internet I/O 500Mbps on every single server.
  • As said, VM.
  • Python on 3.9.5. No node by the way. PHP on 8.0.12, fresh OpenAI API donwloaded yesterday throug PIP (I guess it’s 0.28.1?). But I’ll stuck on PHP through cURL, so no Python needed right now.
  • cURL on 7.76.1, with OpenSSL/1.1.1l and libssh2/1.9.0. Plus ZLib on 1.2.11.

2 snippets:

Chat-completions:

<?php
// Endpoint de la API de OpenAI para chat
$endpoint = "https://api.openai.com/v1/chat/completions";

// Tu clave API de OpenAI
$api_key = "YOUR_OPENAI_API_KEY";

// Datos para la solicitud
$data = array(
    "model" => "gpt-3.5-turbo",
    "messages" => array(
        array("role" => "system", "content" => "You are a helpful assistant."),
        array("role" => "user", "content" => "Who won the World Cup in 2018?"),
    )
);

// Inicializar cURL
$ch = curl_init($endpoint);

// Configurar opciones de cURL
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, json_encode($data));
curl_setopt($ch, CURLOPT_HTTPHEADER, array(
    "Content-Type: application/json",
    "Authorization: Bearer " . $api_key
));

// Realizar la solicitud
$response = curl_exec($ch);

// Cerrar cURL
curl_close($ch);

// Decodificar la respuesta
$response_data = json_decode($response, true);

echo '<pre>'.print_r($response_data, true).'</pre>';

// Imprimir la respuesta del asistente
echo $response_data['choices'][0]['message']['content'];

?>

Completions:

<?php
// Endpoint de la API de OpenAI
$endpoint = "https://api.openai.com/v1/completions";

// Tu clave API de OpenAI
$api_key = "YOUR_OPENAI_API_KEY";

// Datos para la solicitud
$data = array(
    "model" => "gpt-3.5-turbo-instruct",
    "prompt" => "Traduce el siguiente texto al español: 'Hello, how are you?'",
    "max_tokens" => 150
);

// Inicializar cURL
$ch = curl_init($endpoint);

// Configurar opciones de cURL
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, json_encode($data));
curl_setopt($ch, CURLOPT_HTTPHEADER, array(
    "Content-Type: application/json",
    "Authorization: Bearer " . $api_key
));

// Realizar la solicitud
$response = curl_exec($ch);

// Cerrar cURL
curl_close($ch);

// Decodificar la respuesta
$response_data = json_decode($response, true);

echo '<pre>'.print_r($response_data, true).'</pre>';

// Imprimir la respuesta
echo $response_data['choices'][0]['text'];
?>
  • No API spend limit. My account is older and I’ve used my phone number through 3 accounts (work + personal). So no more free credits. I just paid 5$ to test yesterday. No limits, no rate limits. Nothing, just a “hello world” and see what happens (I’m on PLUS on CHatGPT on 2 of those same mobile accounts), just wanted to see how tokens burns to create perfect prompts on my integrations. I guess it’s a “pre-pay” account? I did not activated the “auto-recharge”.

  • Servers work on SentinelOne and on VPC there’re a stack of firewalls and WAF, but that’s not the problem for sure. All the other things run smooth and nicely.

Hope this helps!

Hi! Thanks for your fast reply too.
I know there’s no “free endpoint” so that’s why I’ve said that in realtiy it makes no sense at all… but there should be a real way to test before burning tokens. Maybe like a “dev tokens” with some limitations or so… Like “free dev tokens” which could be used only on a development endpoint, or whatever… Something OPENAI have to come about.

OK onto Chrome thing, but that made me curious… :slight_smile:

I’m not on a “chatbot” thing. I’m more on a “create this”, “write that”, etc. I understand context is important, always. So normally on different “completions” API interactions, I would’ve to append some previous generations to contextualize further responses, for sure. But it’s not propperly a “tiki-taka” chat session. It’s more like “generate this”. Now, “with this, get this part and generate that”. Repeat on other parts. Now “analize this”. Use it to redo that… That’s why when I’ve discovered you’ve created an “INSTRUCT” model, I’ve seen the light! Straight to the point, lesser tokens burnt.

After your repsonse I’ve been thinking: on chat API, you have always to append the previous response onto “messages” data parametter. Isn’t it? So, context grows on every interaction, so token burning just grows exponentially too! No? So it’s a bad business for us developers/creators, but a beautiful one for OpenAI as I could see…
I only see “virtudes” on the Legacy “completions” endpoint, adobe all of the others, as it even let you cut on tokens thorugh parametrization too! Doesn’t it makes more sense? And even INSTRUCT model is not available on Chat endpoint…

Davinci thing, was just because I’ve seen it on Chrome inspection, as I said. Just trying to find a way to cut costs… :smiley:

“burning tokens”? Really?

You know how they say a picture is worth 1000 words?

Look at the picture of Lincoln on a US penny. That’s worth 1000 words, sent ten times to the AI.

Or a whole forum topic sent to an AI:

Reigtechs, a new developer to the OpenAI API, voices his concern over three main points: the legacy status of completions, sluggish response times with chat-completion, and the token burning process during development. He’s tested different environments, including Python, cURL, and PHP, and found completions to work well.

This user likes the functionality of the 3.5-turbo-INSTRUCT model, as it fits his needs, but is worried about the possibility of the completions API being deprecated soon due to being labeled as “legacy”. He also experienced long response times with the chat-completion API, to the point of deeming it unusable.

When testing the API, he notes that the ChatGPT engine is actually not GPT 3.5 but rather DaVinci-002, which surprises him given he had selected 3.5 and found DaVinci-002 to be faster, reliable, and potentially cheaper.

Finally, he suggests the introduction of a “free endpoint” or “dev tokens” for testing, so developers wouldn’t need to burn tokens until the integration runs smoothly.

Foxabilo asks for details about Reigtechs’ environment and the test simulation specifics.

_j clarifies that there isn’t a free endpoint and there could be free API trial credit on a new account. They further explain that DaVinci-002 is a base completion engine that generates text but doesn’t specifically function on chat or following instructions.

Reigtechs responds with the server details, shares code snippets of the API calls made, and reiterates the need for a solution to test the APIs before burning tokens. He also clarifies that his needs are more task-based rather than chat-based and finds the “completions” API to be more beneficial in saving tokens. He further comments on the increase in token usage with each interaction on the chat API, implying it might be costly for developers.

Summarized with AI on Oct 18

What do you mean by this? That this is cheap?
I’m not on prices, and amount burned. I’m on the propper concept of using tokens to test and develop.
Major APIs on the world outside, have dev endpoints to test’n-try, before rolling to production. That’s a fact. OpenAI doesn’t. This is it. And that’s what I’m pointing to.

I paid $5 yesterday. Doing those stupid trys and tests on PHP, in example (trying streams through curl which where not printed on screen, etc…), I burnt $0.03. Not to much? Probably. But burnt anyway.

1 Like