Hello there
I’m using the Assistant API in the sandbox, in a very typical retrieval one-shop logic.
I have made the nice instructions (aka the system_prompt)
I have uploaded a simple .txt file that contains about 2000 lines (approx 36k tokens)
The system prompt basically says something like “i’m gonna give you an input, now identify the best line in my text file that best matches the input”. You can think of a silly classification game with ~2000 classes.
I’m happy with the outcome. it just works. Now, why do I use an assitant instead of just sending the entire big long list in a classic system_prompt and use the completion engine?
- I want to learn/discover how to use the assistant API
) I wanted to check how good it is compared to a classic “completion” approach
+
3) the cost
using this time about 15 times today, cost me about $2.13 so approx 213/15 = 14 cents of dollar per run. (using gtp4 turbo)
in the previous chat completion API, I python I would typically use something like
client = openai.OpenAI()
response = client.chat.completions.create(
model=model_name,
messages=messages,
temperature=1,
max_tokens=max_tokens,
top_p=1,
frequency_penalty=0,
presence_penalty=0,
user=user,
response_format={"type": response_format}
)
t1 = datetime.now()
res = response.choices[0].message.content
input_tokens = response.usage.prompt_tokens
output_tokens = response.usage.completion_tokens
total_tokens = response.usage.total_tokens
duration = (t1 - t0).total_seconds()
and then I would keep the token_counts and the model used to be able to know how much money cost be a given run.
But with the new api, I’m kinda lost and I don’t know / understand / find how to basically know how much money cost me a given run.
How I am supposed to do this?