Hello, I’m developing a web app that uses the assistant stream v2 with the Node SDK and I need to add a thread truncation strategy by adding max_prompt_tokens and max_completion_tokens to efficiently manage the cost of tokens in my threads.
I’ve developed a cloud function that I call to run the thread I’ve already created, but I can’t figure out how to add truncation to my code:
const run = openai.beta.threads.runs.createAndStream(threadId, {
assistant_id: assistantId,
});
for await (const event of run) {
console.log("Event received:", event.event);
if (event.event === "thread.message.delta") {
console.log("Event thread.message.delta received :", event.data);
const chunk = event.data.delta.content?.[0];
...
Do I need to add :
max_completion_tokens?: number | null;
max_prompt_tokens?: number | null;
After “assistant_id: assistantId,” or add them when I import my message into the thread I create beforehand and pass it as the “threadId” parameter in my cloud function?
Thank you for your help