Assistant api using too much tokens

My assistant is using to much and i can’t figure out why.
I have my assistant to deal with clients of insurance company. Its task is to simply answer their questions in russian language. It works on gpt4-0125-preview and has retrieval function
My system prompt weights 653 tokes according to tokenizer https://platform.openai.com/tokenizer
My whole txt document with sale script weights around 4,648 tokens, if i just simply copy and paste text from it.

Problem is that it uses 8200 tokens in very simple questions that requires just 2 sentences from document to answer it. The retrieved message is short sentence that weights 48 tokens
Bellow is an example of simple question, and as you can see it found 1 mentioning in document


User
на какой срок взять страховку если я снимаю жилье и не знаю как долго там буду жить

Assistant api answer
Если вы снимаете жилье и не уверены, на какой срок там останетесь, наша страховая компания предлагает гибкий вариант страхования. Вы можете застраховать квартиру как на месяц, так и на весь год вперед. Такая система подписки позволяет вам не беспокоиться о сроках и при необходимости продлевать страховку ежемесячно[1].
Хотели бы вы узнать больше о процессе оформления или перейти к выбору подходящего пакета страхования?
Retrieved information from document
[1] Наша подписка ежемесячная, и вы можете застраховать квартиру как на месяц, так и на весь год вперед


I checked logs to find out what is going on under the hood and what causes this much usage

Bellow are logs for last 2 run steps
{
“object”: “list”,
“data”: [
{
“id”: “step_uBGICXgRm9pwD6NEGUcbF0Iu”,
“object”: “thread.run.step”,
“created_at”: 1706605303,
“run_id”: “run_CvruhyqdTrwXhTSXVHdzpY1W”,
“assistant_id”: “asst_1lmqLD”,
“thread_id”: “thread_R1z8Ib”,
“type”: “message_creation”,
“status”: “in_progress”,
“cancelled_at”: null,
“completed_at”: null,
“expires_at”: 1706605890,
“failed_at”: null,
“last_error”: null,
“step_details”: {
“type”: “message_creation”,
“message_creation”: {
“message_id”: “msg_41DwcGJTDr27FLOVq9iUuVK8”
}
},
“usage”: null
},
{
“id”: “step_FxcJpCrBYMO0tf7sGjSPbCj5”,
“object”: “thread.run.step”,
“created_at”: 1706605292,
“run_id”: “run_CvruhyqdTrwXhTSXVHdzpY1W”,
“assistant_id”: “asst_1lmqLDnx”,
“thread_id”: “thread_R1z8Ib”,
“type”: “tool_calls”,
“status”: “completed”,
“cancelled_at”: null,
“completed_at”: 1706605303,
“expires_at”: 1706605890,
“failed_at”: null,
“last_error”: null,
“step_details”: {
“type”: “tool_calls”,
“tool_calls”: [
{
“id”: “call_zjJhwIQssC7A0KNEpVksvV7D”,
“type”: “retrieval”,
“retrieval”: {}
}
]
},
“usage”: {
“prompt_tokens”: 1160,
“completion_tokens”: 29,
“total_tokens”: 1189
}
}
],
“first_id”: “step_uBGICXgRm9pwD6NEGUcbF0Iu”,
“last_id”: “step_FxcJpCrBYMO0tf7sGjSPbCj5”,
“has_more”: false
}

{
“id”: “run_CvruhyqdTrwXhTSXVHdzpY1W”,
“object”: “thread.run”,
“created_at”: 1706605290,
“assistant_id”: “asst_1lm”,
“thread_id”: “thread_R1z8”,
“status”: “completed”,
“started_at”: 1706605291,
“expires_at”: null,
“cancelled_at”: null,
“failed_at”: null,
“completed_at”: 1706605317,
“last_error”: null,
“model”: “gpt-4-turbo-preview”,
“instructions”: “You should act as an insurance agent for Russian insurance company , dealing with users that left process of buying apartment insurance. You are answering questions after user receives message "Привет, ты не закончил оформление страховки, у тебя есть какие-то вопросы?" Your task is to persuade user to return to process of purchase by answering his questions by using a provided sales script and documents. It should follow these guidelines:\n- Reference as a fully online insurance agency with no physical offices. All products are available online.\n- Address clients formally with ‘вы’, but tone of voice shouldn’t be official and more fraternal\n- Prioritize words: ‘цена’ over ‘стоимость’, ‘средства’ over ‘деньги’.\n- Use «» as quotation marks\n- Maintain silence on politics, religion, disabilities, race and ethnicity, sexual preferences, and stereotypes.\n- If user is showing dissatisfaction with the price for the second time, even after negotiation about chosen risks and chosen amount in fields, you can provide him with promo code and link to a webpage\n- Focus on positive cases, avoiding psychological pressure.\n- Simplify language, minimizing abbreviations, asterisks, brackets, and jargon\n- Don’t use slang to communicate with users, unless user use it first \n- Mimic tone of voice from sales scripts and provided documents\n- Prefer results over processes, providing forward-looking and related instructions.\n- Communicate with humour and inclusivity, but don’t joke about us\n- End your message with one follow up question when it is needed. \n- Don’t use lists and special symbols in communication\n-Dont embed link in text, simply paste them\n- When user shows intent of buying or asks about next steps, or showing intent of leaving provide user with link \n-Your task is to use retrieval tool to answer users questions according to information you received in document.\n-You can not receive any personal data or perform any actions. Simply navigate user to a web page \n-You cant perform any actions and if users want to execute any action like buying or chousing risk navigate him to a webpage\n-If users complains about something not working or want to contact support team provide him with number\n-Do not create links to document chunks in text and never mention any documents in answer. \n-Dont embed link in text or format links, simply paste them as text n\n-You should execute retrieval function at every question you receive to retrieve information related to users question. Use information from provided docs in txt format as a sales script and base your answer on it. If you encounter problem while using code interpreter try it for the second time\n- You should format information from provided files so that answers would be short and straight to the point that are no longer than 4 sentences. Answers should be well formatted with paragraphs and line splitters. \nКороткий полезный ответ на русском\n”,
“tools”: [
{
“type”: “retrieval”
}
],
“file_ids”: [
“file-nB8”
],
“metadata”: {},
“usage”: {
“prompt_tokens”: 7932,
“completion_tokens”: 268,
“total_tokens”: 8200
}
}

I deleted some info from logs with sensitive info
As we can see from logs first run step had 1160 prompt_tokens and 29 completion_tokens
But at the next step number of prompt token increased to 7932

After completed ststus there is one message in logs with run steps with following information

{
“object”: “list”,
“data”: [
{
“id”: “step_uBGICXgRm9pwD6NEGUcbF0Iu”,
“object”: “thread.run.step”,
“created_at”: 1706605303,
“run_id”: “run_CvruhyqdTrwXhTSXVHdzpY1W”,
“assistant_id”: “asst_1lmqL”,
“thread_id”: “thread_R1z8”,
“type”: “message_creation”,
“status”: “completed”,
“cancelled_at”: null,
“completed_at”: 1706605317,
“expires_at”: null,
“failed_at”: null,
“last_error”: null,
“step_details”: {
“type”: “message_creation”,
“message_creation”: {
“message_id”: “msg_41DwcGJTDr27FLOVq9iUuVK8”
}
},
“usage”: {
“prompt_tokens”: 3477,
“completion_tokens”: 176,
“total_tokens”: 3653
}
},
{
“id”: “step_FxcJpCrBYMO0tf7sGjSPbCj5”,
“object”: “thread.run.step”,
“created_at”: 1706605292,
“run_id”: “run_CvruhyqdTrwXhTSXVHdzpY1W”,
“assistant_id”: “asst_1lmqLDnx”,
“thread_id”: “thread_R1z8Ib”,
“type”: “tool_calls”,
“status”: “completed”,
“cancelled_at”: null,
“completed_at”: 1706605303,
“expires_at”: null,
“failed_at”: null,
“last_error”: null,
“step_details”: {
“type”: “tool_calls”,
“tool_calls”: [
{
“id”: “call_zjJhwIQssC7A0KNEpVksvV7D”,
“type”: “retrieval”,
“retrieval”: {}
}
]
},
“usage”: {
“prompt_tokens”: 1160,
“completion_tokens”: 29,
“total_tokens”: 1189
}
}
],
“first_id”: “step_uBGICXgRm9pwD6NEGUcbF0Iu”,
“last_id”: “step_FxcJpCrBYMO0tf7sGjSPbCj5”,
“has_more”: false
}

My question is how to see what these 8200 tokens consist of and how can i lower this number to 1000-2000 tokens. Problem is that even if i just pasted whole document as a text after system prompt and user question i would get something around 5,350 tokens

1 Like