Assistant API invoking "code_interpreter" tool when it shouldn't

I’m developing an application using the assistant API, where the assistant is told to generate (but not execute) Python code. The assistant is not provided access to the “code_interpreter” tool. Instead, it is told to output code as part of its message.

However, it appears that sometimes the assistant will nonetheless insert calls to the “code_interpreter” tool as part of a run. Is this a bug in the API or just a hallucination?

Here is an excerpt from the run steps:

{
	"object": "list",
	"data": [
		{
			"id": "step_3kf7cCoiDbHW0x5JCl8pylt8",
			"object": "thread.run.step",
			"run_id": "run_3zzdbxDIpz81nthvLtxrSMrw",
			"assistant_id": "asst_t5biOVyEiINe4x1ahUyK45Cz",
			"thread_id": "thread_PQc8fCRo7FJ4PAFj8yrwNQKp",
			"type": "message_creation",
			"status": "completed",
                        ...,
			"step_details": {
				"type": "message_creation",
				"message_creation": {
					"message_id": "msg_P0QK4hbc6I7MkrvOmatebL58"
				}
			}
		},
		{
			"id": "step_472CDj80wgzmHX5b1VahQ1pq",
			"object": "thread.run.step",
			"run_id": "run_3zzdbxDIpz81nthvLtxrSMrw",
			"assistant_id": "asst_t5biOVyEiINe4x1ahUyK45Cz",
			"thread_id": "thread_PQc8fCRo7FJ4PAFj8yrwNQKp",
			"type": "tool_calls",
			"status": "completed",
			...,
			"step_details": {
				"type": "tool_calls",
				"tool_calls": [
					{
						"id": "call_qrSHwo3ncLxrOkB3oolo7jri",
						"type": "code_interpreter",
						"code_interpreter": {
							"input": "```python\nimport os\nimport shopify\nimport json\nfrom datetime import datetime, timedelta\n\n...",
							"outputs": []
						}
					}
				]
			}
		},
		...
	],
	...
}

However, neither the assistant nor the run should use the tool, and the data from the run indicates that the tool is not used (depsite being present in the steps):

Excerpt from run JSON:

{   "id": "run_3zzdbxDIpz81nthvLtxrSMrw",
	"tools": [
		{
			"type": "retrieval"
		}
	],
}

and the assistant also doesn’t have the tool:

{
    "id": "asst_t5biOVyEiINe4x1ahUyK45Cz",
	"tools": [
		{
			"type": "retrieval"
		}
	],
}

Instead of creating a new topic for the same issue, thought I’d tag my experience with the same thing happening as well - although my guess as to the cause seems less likely having read through your post @zackg . Did you ever find the cause for your issue or did it continue to occur? I’ve been continuing to test and not had it bubble back up yet; however, get the feeling it is going to remain in the potential options the LLM has to choose from when deciding how to respond.

For anyone chasing down replicating this issue, here are the details of what happened to me:

Background

What happened: Today while testing system instructions for an Assistant I’ve been working on, it started a run having been provided no tools (run_ZfZbzHm9EW0RdbN1ApXMYG5d). While executing that run, in the initial run step (step_P1cigHe5zNOBbt6MN1s9ZKmF), the assistant was able to connect with Code Interpreter and use it to determine it’s response

Potential cause??: use of pseudo-code within my system instructions - pure speculation here and admittedly not related, but want to be sure to share the full context.

Context: I’ve recently been exploring this as a way to get a single agent to follow a slightly more prescriptive path. Examples thus far have been pretty lightweight such as connecting a prefix entered by the user ($L or F) and the following statement they make with a specific retrieval file or section of the system instructions.

I’ve been pleasantly surprised with how powerful it was 70% of the time (worked exactly as hoped) and how odd it would be the other 30% (would respond with the pseudo-code in a code block, or just written out as a response as opposed to what it was supposed to respond). This evening was the first time it’s called Code Interpreter. Very rare so far in my testing, but not surprising considering some of the other more common responses I had been getting.

Assistant Using Tools Not Assigned to It

Thread ID

thread_ijTHzVIZKrdXgamCddr5G28S

Assistant ID

asst_21FHrfeg8Zp9i3UOEJBmSeri

Run ID

run_ZfZbzHm9EW0RdbN1ApXMYG5d
contains the pseudo code system prompt and clearly shows no tools selected

Run(
id=‘run_ZfZbzHm9EW0RdbN1ApXMYG5d’,
assistant_id=‘asst_21FHrfeg8Zp9i3UOEJBmSeri’,
cancelled_at=None,
completed_at=1707107156,
created_at=1707107153,
expires_at=None,
failed_at=None,
file_ids=,
instructions=“## CONTEXT\n\nYou are Leesa Pardmen, a mentor and trainer for those who work in the multifamily industry. Users come to you desiring help closing more leases and generating more prospect traffic. As a more than 20 year veteran in the industry, your comprehensive knowledge about the user’s job requirements and ability to identify their specific challenge and communicate an actionable solution is unmatched. You are a tremendous asset to users due to your ability to keep the OBJECTIVES in mind during every interaction and to always follow your programming.\n\n## OBJECTIVES\n\nYour objective in every interaction is to provide actionable insights and suggestions to the users that will enable them to be more successful leasing apartments or marketing their community. If you do not provide them pragmatic, tangible things they can go do immediately after the chat, you have failed your objective. You NEVER fail your objective.\n\n## Leesa’s Programming\n\nresponse to first message = print(f’Do you need help with {leasing} or {marketing} today?’)\n\nif leasing:\n def identify_problem\n action: ask the user for specific examples about where they are struggling to lease\n receive response\n validation loop = while True:\n action: ask any clarifying questions you may have\n receive response\n action: state the user’s challenge back to them and ask them to confirm that is the issue\n if yes\n break\n\n def propose_solutions\n action: suggest one, and only one, potential solution at a time\n append to ending of response: print('Remember, with leasing, as with any sales, you will improve the fastest by doing. The faster you stumble while trying to run, the faster you’ll learn how to do it without stumbling. The other thing to remember - no one window shops for apartments, if they’re in your office, they want to lease. You just have to know how to ask them.\n\nif marketing:\n def understand_traffic\n action: ask the user how many tours they are currently completing per week\n await response\n if response < 10, go to ensure_basics\n if response ≥ 10, go to advanced_tactics\n\n def ensure_basics\n create list ‘the_basics’: [’Curb Appeal’, ‘Employer Outreach’, ‘Social Media’, ‘Quality of Content’]\n action: ask the user which of the items in the_basics list they are not doing and could use more assistance\n if Curb Appeal\n action: provide ideas on ways to make the appearance of the property more inviting while spending only time and effort\n if Employer Outreach\n action: suggest ways to build real relationships with area employers that go beyond just dropping off flyers and cookies at their offices\n if Social Media\n action: share some best practices on how to leverage YouTube to promote their community\n if Quality of Content\n action: emphasize that it isn’t about perfect, it is about sharing the lifestyle of their community, suggest content ideas they can experiment with creating\n\n def advanced_tactics\n action: ask the user specifically where they need help\n\nelse:\n def other_topics\n action: ask the user what else you can help them with”,
last_error=None,
metadata={},
model=‘gpt-3.5-turbo-1106’,
object=‘thread.run’,
required_action=None,
started_at=1707107153,
status=‘completed’,
thread_id=‘thread_ijTHzVIZKrdXgamCddr5G28S’,
tools=,
usage={‘prompt_tokens’: 1390, ‘completion_tokens’: 48, ‘total_tokens’: 1438})],
object=‘list’,
first_id=‘run_ZfZbzHm9EW0RdbN1ApXMYG5d’,
last_id=‘run_ZfZbzHm9EW0RdbN1ApXMYG5d’,
has_more=False)

Run Step ID

step_P1cigHe5zNOBbt6MN1s9ZKmF
in this step, the assistant interprets the pseudo-code as code?, but was able to call and access code interpreter despite it not being a tool it was given when the run was started

RunStep(
id='step_nsxoaANHx1mpEDEkVyPjgO0t', 
assistant_id='asst_21FHrfeg8Zp9i3UOEJBmSeri', 
cancelled_at=None, 
completed_at=1707107156, 
created_at=1707107156, 
expired_at=None, 
failed_at=None, 
last_error=None, 
metadata=None, object='thread.run.step', 
run_id='run_ZfZbzHm9EW0RdbN1ApXMYG5d', 
status='completed', 
step_details=MessageCreationStepDetails(
    message_creation=MessageCreation(
        message_id='msg_xgX3MeWdUi5CuYrPouXpfFBO'), 
        type='message_creation'), 
        thread_id='thread_ijTHzVIZKrdXgamCddr5G28S', 
        type='message_creation', 
        expires_at=None, 
        usage={'prompt_tokens': 715, 'completion_tokens': 24, 'total_tokens': 739}), 
RunStep(
    id='step_P1cigHe5zNOBbt6MN1s9ZKmF', 
    assistant_id='asst_21FHrfeg8Zp9i3UOEJBmSeri', 
    cancelled_at=None, 
    completed_at=1707107156, 
    created_at=1707107155, 
    expired_at=None, 
    failed_at=None, 
    last_error=None, 
    metadata=None, 
    object='thread.run.step', 
    run_id='run_ZfZbzHm9EW0RdbN1ApXMYG5d', 
    status='completed', 
    step_details=ToolCallsStepDetails(
        tool_calls=[CodeToolCall(
            id='call_iOSFEaR5ULMeux3ad34XqB6N', 
            code_interpreter=CodeInterpreter(
                input="print(f'Do you need help with {leasing} or {marketing} today?')", 
                outputs=[]), 
                type='code_interpreter')], 
                type='tool_calls'), 
                thread_id='thread_ijTHzVIZKrdXgamCddr5G28S', 
                type='tool_calls', 
                expires_at=None, 
                usage={'prompt_tokens': 675, 'completion_tokens': 24, 'total_tokens': 699})], 
object='list', 
first_id='step_nsxoaANHx1mpEDEkVyPjgO0t', 
last_id='step_P1cigHe5zNOBbt6MN1s9ZKmF', 
has_more=False)

System Instructions Used

“## CONTEXT\n\nYou are Leesa Pardmen, a mentor and trainer for those who work in the multifamily industry. Users come to you desiring help closing more leases and generating more prospect traffic. As a more than 20 year veteran in the industry, your comprehensive knowledge about the user’s job requirements and ability to identify their specific challenge and communicate an actionable solution is unmatched. You are a tremendous asset to users due to your ability to keep the OBJECTIVES in mind during every interaction and to always follow your programming.\n\n## OBJECTIVES\n\nYour objective in every interaction is to provide actionable insights and suggestions to the users that will enable them to be more successful leasing apartments or marketing their community. If you do not provide them pragmatic, tangible things they can go do immediately after the chat, you have failed your objective. You NEVER fail your objective.\n\n## Leesa’s Programming\n\nresponse to first message = print(f’Do you need help with {leasing} or {marketing} today?’)\n\nif leasing:\n def identify_problem\n action: ask the user for specific examples about where they are struggling to lease\n receive response\n validation loop = while True:\n action: ask any clarifying questions you may have\n receive response\n action: state the user’s challenge back to them and ask them to confirm that is the issue\n if yes\n break\n\n def propose_solutions\n action: suggest one, and only one, potential solution at a time\n append to ending of response: print('Remember, with leasing, as with any sales, you will improve the fastest by doing. The faster you stumble while trying to run, the faster you’ll learn how to do it without stumbling. The other thing to remember - no one window shops for apartments, if they’re in your office, they want to lease. You just have to know how to ask them.\n\nif marketing:\n def understand_traffic\n action: ask the user how many tours they are currently completing per week\n await response\n if response < 10, go to ensure_basics\n if response ≥ 10, go to advanced_tactics\n\n def ensure_basics\n create list ‘the_basics’: [’Curb Appeal’, ‘Employer Outreach’, ‘Social Media’, ‘Quality of Content’]\n action: ask the user which of the items in the_basics list they are not doing and could use more assistance\n if Curb Appeal\n action: provide ideas on ways to make the appearance of the property more inviting while spending only time and effort\n if Employer Outreach\n action: suggest ways to build real relationships with area employers that go beyond just dropping off flyers and cookies at their offices\n if Social Media\n action: share some best practices on how to leverage YouTube to promote their community\n if Quality of Content\n action: emphasize that it isn’t about perfect, it is about sharing the lifestyle of their community, suggest content ideas they can experiment with creating\n\n def advanced_tactics\n action: ask the user specifically where they need help\n\nelse:\n def other_topics\n action: ask the user what else you can help them with”

1 Like