Which latest pandas version does OpenAI use to train LLMs?

Hi there,

Could you please tell me on which latest pandas version OpenAI LLMs are trained?

Thanks in advance!

Welcome to the community!

Have you tried to ask the model what the last version is that it’s aware of?

While some models say the training cutoff was april 2023, more general knowledge is often limited to somwehere mid 2022.

@Diet, thanks for the quick response! I wanted to generate a simple code snippet on pandas 2.0 with ChatGPT. While chatting with ChatGPT 3.5 it told me “As of my last update in January 2022, Pandas 2.0 hadn’t been released yet”. I was wondering if more recent versions of ChatGPT support pandas 2.*?

Unlikely. That said, you can still try to give the model the information it needs to accomplish the task. Maybe give it the list of enhancements so it knows what to do?

My intention is to use a latest pandas code, which a model would generate, since libraries (pandas in this certain case) introduce a lot of optimizations in recent releases to achieve ultimate performance. So I would like to get the code of latest pandas from the model if it is possible. Do you think when OpenAI LLMs will be trained on latest pandas versions?

ChatGPT 4 has a more recent knowledge cut-off. It’s April 2023.

When using code interpreter it’s also possible to upload a Python wheel and install packages. But this process has never been straightforward and may not be successful.

Given that, is there a chance to know if ChatGPT 4 was trained with pandas 2.0? pandas 2.0 was released in April 2023.

The latest version of the Pandas library I’m aware of is 1.5.2. However, please note that newer versions could have been released after my last update. You can check the latest version by visiting the official Pandas website or by checking the PyPI (Python Package Index).

Got this reply in 2 from 2 tries.

The AI is trained on a corpus of knowledge and code, not directly on “here’s the pandas documentation, now you know it”.

So the actual training and ability to write code is by the sequences of tokens in books, stackexchange, github, etc. and isn’t really divided into versions except for context sequences that lead up and into correct usage (the same way that if you code in python it usually doesn’t switch to FORTRAN or farsi).

Oh, I see, thanks!

Does this mean that if a model generates the code that is back- and forth-compatible with pandas 1.* and 2.*, it is possible to use pandas 2.* to execute the code?

1 Like

Yes. If the model can code for python 3.10 and there are no game breaking changes in version 3.11 then it can work.

My experience is that if there is a change in between versions you can spot and correct it. But since coding with ChatGPT often resembles a ‘coding at slow typing speed’ process this is quite cumbersome.

If possible sign up for a single month and see if the model can do what you need.

Thank you for answering my questions! That makes sense to me. There is probably one last question. What is the cadence of training models?