Solving Data Science Exercises With Codex

So I finally got time this week to take codex out for a spin. I decided what better to do than have codex go ahead and do some of the exercises I teach to my data science students when they’re learning data science. I’ll be able to answer the question of is AI better at data science than data scientists?

Anyways I got an on surprising results that the exercises that I didn’t really teach are all over the Internet, and codex has already memorized them :-). I’ve included an unedited version of the video below if anyone wants to check out the results: Is Openai Codex Smarter Than A Data Scientist?!? - YouTube

I’ll be trying to find harder and more obscure exercises for future videos. But this is a fairly fun problem for teachers of the future, if AI can solve simple math and data science problems, grading homework starts to seem a little less relevant.

p.s. I currently have the video unlisted, and contacted openai support to ask if this type of video would be acceptable. To my knowledge it is, but do you let me know if y’all would prefer this type of video not to be released!


Thanks! This is a very interesting video. I assume you’ve had many students over the years who posted their solutions on Github. None of them posted it in the output format probably (where the instructions are written as comments), but otherwise the task seems well memorized. Very curious what your investigations result in as you try it on unseen problems. If you’d like to get a quick response on this video, you can contact, but to me this content seems fine to be released. Also If you’re interested in using gpt3 / codex in your classroom more broadly (and would like to give students access), DM me your email and I can put you in touch for how we could support that.


I have tried the same thing from the other side (I am a student and tried using it for a data science class. not this one :-)). Even with more advanced exercises that are not present verbatim on Github, it’s one of the best use-cases for the model I have found so far - in particular for the menial aspects like visualizing results and preprocessing the data.

I think it would be a brilliant tool for teaching - instead of spending 90% of the time fiddling with python libraries and reading NumPy documentation, it allows for extremely fast experimentation and for focusing on understanding the actual underlying concepts, which has always been my frustration when taking this type of class: getting the libraries to work for you takes so much effort that there is little time left for actually learning the materials.

Visualizing and prototyping with codex is so fast that I also always have it open on the side when studying the theoretical parts of the class - it allows me to quickly try things and test my understanding in a way that would never have been possible (timewise) if I had to write everything by hand.