How do I know codex was programming?

I am using codex 5.1 through the API. I am asking it to create some math questions based on topics.
E.g. some algebra task solving an equation.
I am asking it to use python to generate and verify the sample solution it should also provide.

Now, it gives me back said math question. However, through the API, how can I see it actually used code to generate it?
Because I need it to really use python to generate and verify the question as otherwise it makes mistakes.

1 Like

Hello!

The Codex model does not inherently come with a Python code interpreter on the Responses API. It is a tool that you must specifically enable. You pay per setup of a code interpreter container with a short expiry, costing between $0.03 and $1.92 depending on the RAM specified. You can also read about its management at the link above.

To receive response “output” items (or events) that have the AI’s use of code that it sends, on this endpoint, you would add the "include" parameter, a list of methods that will return data items in the “output” list. You include these strings in the list for more data:

  • "code_interpreter_call.outputs": Includes the outputs of python code execution in code interpreter tool call items.
  • "computer_call_output.output.image_url": Include image urls from the computer call output. (computer-use model?)

The dots and underscores in the strings are just to confuse you, I suppose.

3 Likes

thanks but for codex I am getting this error:
{‘error’: {‘message’: "Tool ‘code_interpreter’ is not supported with gpt-5.1-codex-max….

I was not aware that one could not use the hosted code interpreter tool in conjunction with API codex models. You would not be able to discover that on the models features page, as codex models are not listed there like they should be.

Patch tool and shell tool might be your choice, along with “custom functions” for unstructured calling.

You: list workspace contents for me. Then make a patch to the hello world script so that it instead produces a random choice from three different wecome messages.
[usage] in/cached: 834/0 | out/reasoning: 177/64
[patch] update_file: hello_world.py
[usage] in/cached: 1142/0 | out/reasoning: 676/640

Workspace files:
- hello_world.py
- responses-minimal.py

Applied patch to hello_world.py to print a random choice from three welcome messages.

You: 

Code without extra fees, which is much nicer. Just program with a high level of constraints on what is programmatically allowed via functions, for shell, in a virtualized environment the AI cannot escape, or you’re asking for trouble.

1 Like

Thanks. This seems to be more like a local coding agent use case?
In my case, I don’t need it to have access to anything locally. I just want that gpt sends me back a Math question generated or “proofed” with python.
Seems like not really possible then?
I recall gemini can do just that, so probably have to switch

Just switch models to normal gpt-5.2. It will do fine. Codex is tuned for patching and code-writing, not giving you chatty answers (backed by algorithm checks).

Hm but I do not want chatty answers - only the Math question and python code so I can see that it actually created the questions with python.

I tried that now with gpt-5.2. I asked it to create 6 questions. It did. But looking at the code part, I can only see a one liner, so it did not really use python for all questions. Seems like only for one question and not even the whole of it.
Any idea why? I thought it would use more python. Not just this:
code="import sympy as sp\nx=sp.Symbol('x')\nsp.expand(4*(x+2)*(x-1))\n",
Whole output:
[ResponseCodeInterpreterToolCall(id='….', code="import sympy as sp\nx=sp.Symbol('x')\nsp.expand(4*(x+2)*(x-1))\n", container_id='….', outputs=None, status='completed', type='code_interpreter_call'), ResponseOutputMessage(id='….', content=[ResponseOutputText(annotations=[], text='{\n "questions": [\n {\n "question_type": "MULTIPLE_CHOICE",\n "title": "Find equivalent expressions for the quadratic function below. Select ALL that apply.",\n "content": "Let f(x) model the height of a toy rocket (in meters) after x seconds: f(x) = 4x² + 4x - 8. (Here, c = 2.)",\n "points": 100,\n "options": [\n "4(x² + x - 2)",\n "4(x + 2)(x - 1)",\n "2(2x² + 2x - 4)",\n "4(x - 2)(x + 1)",\n "4x(x + 1) - 8"\n ],\n "correct_indices": [\n 0,\n 1,\n 2,\n 4\n ]\n },\n {\n "question_type": "TABLE_YES_NO",\n "title": "Is the given expression equal to 4x² + 20x - 24?",\n "content": "A company’s profit (in thousands of dollars) is modeled by P(x) = 4x² + 20x - 24, where x is the number of ads purchased. Select Yes or No for each expression.",\n "points": 100,\n "rows": [\n {\n "statement": "4(x² + 5x - 6)",\n "is_true": true\n },\n {\n "statement": "4(x + 6)(x - 1)",\n "is_true": true\n },\n {\n "statement": "4(x + 3)(x + 2)",\n "is_true": false\n },\n {\n "statement": "2(2x² + 10x - 12)",\n "is_true": true\n },\n {\n "statement": "4x(x + 5) - 24",\n "is_true": true\n }\n ]\n },\n {\n "question_type": "MULTIPLE_CHOICE",\n "title": "Find equivalent expressions for the quadratic function below. Select ALL that apply.",\n "content": "A rectangular garden has area modeled by A(t) = 3t² - 21t - 24 (in square meters), where t is the number of meters added to one side. (Here, c = 8.)",\n "points": 100,\n "options": [\n "3(t² - 7t - 8)",\n "3(t - 8)(t + 1)",\n "3(t - 1)(t + 8)",\n "3(t - 4)(t - 2)",\n "t(3t - 21) - 24"\n ],\n "correct_indices": [\n 0,\n 1\n ]\n },\n {\n "question_type": "TABLE_YES_NO",\n "title": "Is the given expression equal to n² - n - 20?",\n "content": "A school club tracks its balance (in dollars) with B(n) = n² - n - 20, where n is the number of fundraiser items sold. Select Yes or No for each expression.",\n "points": 100,\n "rows": [\n {\n "statement": "(n - 5)(n + 4)",\n "is_true": true\n },\n {\n "statement": "(n + 4)(n - 5)",\n "is_true": true\n },\n {\n "statement": "(n - 4)(n + 5)",\n "is_true": false\n },\n {\n "statement": "(n - 20)(n + 1)",\n "is_true": false\n }\n ]\n },\n {\n "question_type": "MULTIPLE_CHOICE",\n "title": "Find the expressions that are NOT equivalent to the quadratic expression below for all values of x. Select ALL that apply.",\n "content": "A ball’s vertical position is modeled by h(x) = 6x² - 30x + 96 (in centimeters), where x is time in seconds. (Here, c = 16.)",\n "points": 100,\n "options": [\n "6(x² - 5x + 16)",\n "6(x - 8)(x - 2)",\n "3(2x² - 10x + 32)",\n "6(x - 4)(x - 4)",\n "6x(x - 5) + 96"\n ],\n "correct_indices": [\n 1,\n 3\n ]\n },\n {\n "question_type": "MULTIPLE_CHOICE",\n "title": "Which ordered pairs (a, b) make the following equation true for all values of x. Select ALL that apply.",\n "content": "A model for total cost is written as (x + a)(x + b) = x² + 10x + 16. (Here, c = 8.)",\n "points": 100,\n "options": [\n "a = 1, b = 16",\n "a = 2, b = 8",\n "a = 4, b = 4",\n "a = 8, b = 2",\n "a = 16, b = 1"\n ],\n "correct_indices": [\n 1,\n 3\n ]\n }\n ]\n}', type='output_text', logprobs=[])], role='assistant', status='completed', type='message')]

1 Like

“Chat” - as in you talk to the AI, and it is trained to converse with you when it produces text, acting as an entity that exists.

And “chat”: the AI is conversationally steerable, also by a initial “developer” message that sets behavior, such that you can say, “First verify solutions for every mathematical question/answer pair you’d provide by using your internal Python tool for computation.”

It might have run this merely to verify assumptions already produced internally as unseen reasoning:

>>> import sympy as sp
>>> x=sp.Symbol('x')
>>> sp.expand(4*(x+2)*(x-1))
4*x**2 + 4*x - 8

The case for code interpreter is when you need mathematical answers. The AI itself can do symbolic math quite well. Here’s when it infers the need to use Python internally:

What happened behind the scenes:


When you contemplate whether you should not trust the AI if it doesn’t use Python…