Yea, there may be some very easy to implement check for duplication like that…
Assume I am really stupid, but only just read all the documentation and API reference.
https://platform.openai.com/docs/api-reference/messages/modifyMessage
How do I simulate what locally is just system + chat[-turns*2:] + user
to pass the number of turns that fit in my token budget?
No need to answer hastily, I’ll give you all the time you need.
Then we have the utility of “answer not ready yet, still looping calling your API endlessly with the same thing”.
def truncate_messages(thread):
max_length = 4096
truncated_thread = []
for message in thread:
# Truncate message to 4096 characters if it's longer
truncated_message = message[:max_length]
truncated_thread.append(truncated_message)
return truncated_thread
Not tokenised, but you get the idea.
Just to clarify: we manage our conversations anyway, ignoring the thread feature?
Because you try to send some actual user/assistant conversation into a new thread, I expect you will get a nice traceback
openai.BadRequestError: Error code: 400 - {‘error’: {‘message’: “‘assistant’ is not one of [ ‘user’] - ‘messages.1.role’”, ‘type’: ‘invalid_request_error’, ‘param’: None, ‘code’: None}}
(simulated, even though errors are free)
Assistants have no other way to receive conversation history except by specifying a thread. It can receive an “instruction” upon creation. Or it can receive “user” input messages.
Will it believe you when you say:
user: (no this is actually what you said)
Reference: https://platform.openai.com/docs/api-reference/threads/createThread
I’ve not tried it, are you saying that there is 100% no way to modify an existing message? or manage message content in a thread at all? If that is so then that seems like an oversight.
Anti-multishot technology ™, because the API developer is an adversary not to be trusted.
so that way i’m doing it is retrieving or using the same thread in an existing conversation. i noticed that IT IS quite expensive. is it correct that everytime you’re using the same thread from first to latest of the conversation will be charged in tokens?
let currentThreadId = null; // Global variable to store the current thread ID
let conversationHistory = []; // Stores the history of the conversation
const assistantId = '***';
async function askGPT(question) {
try {
conversationHistory.push({"role": "user", "content": question});
// Retrieve the Assistant
const myAssistant = await openai.beta.assistants.retrieve(assistantId);
console.log(myAssistant); // For debugging
if (!currentThreadId) {
// Create a new thread if there isn't an existing one
const thread = await openai.beta.threads.create();
currentThreadId = thread.id;
} else {
// Optional: Retrieve the existing thread (for verification or additional logic)
// const thread = await openai.beta.threads.retrieve(currentThreadId);
}
// Add a message to the thread with the user's question
await openai.beta.threads.messages.create(currentThreadId, {
role: "user",
content: question
});
// Run the assistant to get a response
const run = await openai.beta.threads.runs.create(
currentThreadId,
{ assistant_id: assistantId }
);
let runStatus = await openai.beta.threads.runs.retrieve(
currentThreadId,
run.id
);
// Polling for run completion
while (runStatus.status !== 'completed') {
await new Promise(resolve => setTimeout(resolve, 1000)); // Wait for 1 second
runStatus = await openai.beta.threads.runs.retrieve(currentThreadId, run.id);
}
// Retrieve the messages after the assistant run is complete
const messagesResponse = await openai.beta.threads.messages.list(currentThreadId);
const aiMessages = messagesResponse.data.filter((msg) => msg.run_id === run.id && msg.role === "assistant");
// Assuming the last message is the assistant's response
return aiMessages[aiMessages.length - 1].content[0].text.value;
} catch (error) {
console.error('Error in askGPT:', error.response ? error.response.data : error);
return 'An error occurred while processing your request.';
}
}
is it much cheaper to create a new thread on every input and output? and don’t remember conversation anymore to lessen the usage of tokens?
Which of course really nullifies most of the benefits. This API is ridiculously expensive.
Nope. No way. We have no control over messages at all besides modifying metadata, adding a new user message, and attaching assistants.
100%. There is no way this is ready for production. It’s incredibly slow, expensive, and lacking in a lot of features.
I have hopes for the future though.
Well, this is what beta development is about, testing the product and refining, I can see a context limiting setting becoming a thing and also a message length and size limit setting as well.
Constructive feedback forms an essential part of the development cycle.
Am I right that there is the ability to limit tokens on the Chat API, but not on the Assistant API?
imho, this is generic functionality that many of us have built in various ways locally already to get around the fact that no-one had yet offered a decent “assistant API” to date.
However, this version was poorly thought out, too simplistic and should have better anticipated client sensitivity to cost. Paying ~25c per call is ridiculous and will not fly in Production.
It would be nice to have a more open approach to improving this API.
The developers should:
- summarise the feedback they’ve received so far
- inform us what improvements they are looking into … and then
- update us on the new version when it becomes available.
Perhaps that’s in progress already and someone could share the official comms?
Thats not a beta release. This is not even alpha. Its not working as intended.
Check out my bug report:
Why is it generating messages over and over again? Here is no thread context to be aware of. To be honest, it feels like a big rip-off and make some extra money.
I only just figured this out tonight. I’ve been busy building out additions same when dumping the returned data, I saw this happening. I thought it was my local json being appended until I deleted it. It seems as though you may be better off generating a new thread at every call and grabbing the last returned messages to append your new message with. Keep track of the last used and current thread id. Would that not get the same response? I have to try. It would reduce cost drastically if it adheres to the message structure the same way. I’ll have to experiment.