Fun with GPT-5 code interpreter - and why it likely fails to deliver files in many instructed cases

It might be the way you talk to gpt-5 that sends it down the wrong path and causes you issues in getting files as deliverables. Along with OpenAI causing the issues and producing another “application” that is for you to “consume”, not for you to “develop on”…

What does the AI need to do for me upon elevated request? Let’s see its reasoning…

Explaining tool differences

I need to clarify the difference between the internal “python” tool and the user-visible “python_user_visible” tool. The “python” tool is for private reasoning and cannot show outputs to users; it’s used only in the analysis channel. In contrast, “python_user_visible” executes code that generates outputs shown to users in the commentary channel, including files and plots. There are constraints like being in a stateful environment without internet access. For file creation, I must provide a download link. I’ll also explain their usage and rules around privacy.

Clarifying tool usage

I should explain that both the “python” and “python_user_visible” tools execute Python code and are stateful. The “python” tool is for internal calculations and won’t show outputs to the user, while “python_user_visible” is meant for code that users can see, like file generation and plots. I’ll remind the user: for internal calculations without revealing code, use “python.” For visible outputs, use “python_user_visible.” Additionally, when files are created with “python_user_visible,” I need to provide a download link.

Change the gpt-5 model size; ask independently:

Is the AI’s assumption about the implementation wrong?

Well, I can certainly see what the AI is trying to hold back.

The AI can’t “think in code” if you don’t enable code interpreter, though.

I’d have to write code to see if this dual functionality is the source of many problems in getting file deliverables output by the locked-down container files…which would only solve the problems of others billed per container.

Summary

Key differences and rules

  • Visibility
    • python: outputs are hidden from the user (private/internal).
    • python_user_visible: outputs are shown to the user in the conversation.
  • Channel requirement
    • python must be called from the assistant’s analysis channel (used for internal reasoning).
    • python_user_visible must be called from the commentary channel (so you can see results).
  • When to use each
    • Use python when the assistant needs to perform private calculations or intermediate steps that you don’t need to see.
    • Use python_user_visible when you want to see code, tables, plots, or downloaded files produced by the run.
  • Files and downloads
    • python_user_visible runs can create files. If a file is created for you, the assistant should provide a download link to /mnt/data.
  • Environment constraints
    • Both run in a stateful Jupyter-like environment with no internet access.
    • Use Python only; execution time is limited.
  • Privacy/behavior
    • Do not use python for revealing the assistant’s chain-of-thought; it’s only for private computation. python_user_visible will show whatever code/output is executed.

Example uses

  • python (thinking): silently compute large intermediate numeric results, run internal checks, or test ideas that you do not need to see.
  • python_user_visible: generate a plot, produce a CSV for you to download, or show a small script and its output.
2 Likes

So asking for a true random number super intrigued me, as I’m familiar with how RNG works.

This is a session I had with 5 that had this been 4o, (and we tested this extensively back in march across the world using an internet video chatroom) this sort of session would have completely skewed the reasoning behind the request for such a thing as a true random number.

(strongly suggest doing a page word search for ‘random numb’, as the word random alone will pull up almost 30 hits…) it was a pretty interesting look at turbulence, laminar flow, and such that lead to me remembering RNG…

5 seems way more stable to me than 4o did, despite all the initial complaints about not being able to date their Ai girlfriend anymore.

  • “True random” by code, instead of AI token prediction with top logprob always the same.

Three years of ChatGPT chats being run back through RLHF - and AI gets quite certain, while gpt-4-0314 at sampling defaults is 37-happy but still random-looking (33, 37, 38).

Best crypto-strength Python RNG to deliver a int range?

Short answer: still use the OS CSPRNG via secrets or random.SystemRandom. They do not require a CPU hardware RNG. The kernel’s CSPRNG collects entropy from many sources (interrupt timing, devices, etc.) and remains the best practice.

Standard library (recommended):
    secrets: n = secrets.randbelow(50) + 1
    SystemRandom: n = random.SystemRandom().randint(1, 50)

(In the first post, gpt-5 used what it separately recommends…)

the whole concept is both mind numbingly fascinating to me and intriguing as hell…

beautiful really

if you’re a weird techno-spiritualist who asks too many questions,

lolollol