Add gitpython and downloadable models to Code Interpreter?

Hi OpenAI:

Can you add gitpython to the list of installed packages in the Code Interpreter sandbox? I want to be able to upload a repo and have the AI read through the diffs, etc.

Also, it tried to use SpaCy and NLTK libraries, which are installed, but don’t actually work because it has no internet access and can’t download the models, so you should pre-download those and install them in the virtual environment, too.

Code Interpreter (CI) does not (as of yet) have access to the live internet. So, even if you had access to the packages you want, the model would not be able to make use of them.

Currently, the only way to give data/files to CI is to use the upload button. You can upload an archive file (zip, rar, 7z, tar, etc) and CI will unpack it in your sandbox, so if you get all of the files of interest from the repository you want to analyze, that might be a possibility.

You can get the diff for each individual commit by appending .diff to the commit URL.

Using the latest commit (at the time of this writing) for OpenAI evals as an example,


“I want to run a language model within my python within my language model”

The AI language libraries are probably there for their utility functions.

Imagine if they added tiktoken on it - and without it needing to hit the internet for dictionaries. “How many tokens of conversation history passed this turn. Then to fool the history mangler, summarize everything we’ve talked about.”

Maybe someone from OpenAI could stroll across this. It seems they are focused more on the next tweaks for ChatGPT than any API dev.

1 Like

Yes, so I’m asking for the packages to be pre-downloaded into the environment, so it can actually use those libraries.

The libraries are already installed in the environment, but can’t actually work without internet access.

Yes, I’ve done that, uploading the en_core_web_sm files inside a .zip, but it wasn’t able to use them. Maybe it could have if it tried harder, I don’t know.

I’m talking about a local git repo. Yes, I could manually generate diff files for each commit, but it would be simpler if ChatGPT could read the diffs from the repo itself.

I guess, but not large language models.

en_core_web_sm is 13 MB, for instance.

Yes, that’s what I’m asking for. If it’s helping me write code for accessing the OpenAI API, it should be able to run tiktoken locally to try out that code without needing internet access.

Apologies, from your first post in this topic I mistakenly thought you wanted to be able to simply point CI at an online repo (e.g. github) and have it pull and interpret the diffs.

I believe I now understand what it is you’re hoping to achieve.

With respect to the SpaCy and NLTK libraries, I would guess (at least) one reason why the pre-trained models are not pre-downloaded and installed is because of the size requirements in having them in everyone’s virtual environments—but maybe that’s not as large a concern as I am imagining. I’ve not used either (I only “dabble” in python) so I could be entirely off-base there.

I’m not entirely sure why you think this would be simpler. I sure it’s easier for you, but it is perhaps asking more of the model than you should?

I always find I get better results when I go out of my way to meet the model where it is. In this case I would expect that to be generating the diff files myself in a way which is easy for the model to extract the information it needs from the diffs to answer any questions I would have.

Regardless, with respect to your original post, this isn’t the proper venue in which to reach anyone at OpenAI with this type of decision making authority. Your best bets would be to provide in-chat feedback and submit a suggestion via

Lastly, I’m changing the category of this topic to ChatGPT since it relates to using and not the development of a plugin for ChatGPT.

1 Like

Well that would be nice, too. :smiley: But no I just meant uploading a zip of a git repo.

I would imagine that such files are actually only stored once and reflinked/hardlinked across all the virtual machines? I don’t know much about the implementation of such things, though.

ChatGPT initially tried to do it of its own accord, but then realized gitpython wasn’t installed, so I don’t think it’s difficult for it to use.