at the moment my tool sends a AJAX request using a PHP file that sends it to https://api.openai.com/v1/chat/completions and generates a JSON message that is rendered via JavaScript.
The problem is that the output is rendered at a strech and not token-via-token.
If set, partial message deltas will be sent, like in ChatGPT. Tokens will be sent as data-only server-sent events as they become available, with the stream terminated by a data: [DONE] message. See the OpenAI Cookbook for example code.
Did you paste in the entire response and then send it?
As a Discourse admin on another site Askimet will flag post by users with low Discourse trust levels that were not entered by hand. The reason for flagging such post is that this is a key indicator of spam bots as bots are much faster than humans at creating post. If it has been more than 24hrs since posting and if the admins and moderators here were doing what they need to then your post should have been reviewed and a response sent back already.
I was once on another well known site where the all the admins and moderators were not active for months, so I turned the use of the site into a social experiment, I hope the same does not happen here.
If you want an estimation, you can count the number of network SSE delta chunks you receive. That only works generally when language is only ASCII English; anything Unicode or emoji may have multiple tokens per chunk to send you a complete character, so it would be an underestimation of costs.
The correct way is to reassemble the AI language into a single string, and use the correct token encoder, tiktoken, to count tokens in the response.
The better correct way: If OpenAI can send you a finish reason chunk, they can send you usage - but just don’t.