Who is still using Codex?

Hi all!

I was wondering who is still using Codex, and why?
I feel unmotivated to continue making experiments with it as OpenAI insists on keeping radio-silence about codex. For me its an abandoned product.
It seems nowadays the new chat models are quite good at code, probably much better than codex.

So, is there a reason to continue using this?
help me understand :slight_smile:
thanks

1 Like

Yes, I am still using Codex.
Because, it’s free and max token is 8001.
Also, the response generated by Codex is enough for my scenario.

As I recall, the GItHub Copilot VSC extension uses codex so this is a huge user base of codex model users (millions of users).

:slight_smile:

See also:

See also:

How many users does [Copilot] even have though? The Visual Studio Code extension has been installed nearly 3.7 million times, while the Visual Studio tool has been installed nearly 154,000 times. According to GitHub, more than 1.2 million developers used Copilot’s technical preview in the past 12 months as of September, 2022. I wonder what the numbers are at now?

2 Likes

I mostly use Codex. It’s the least “creative” of the models, which is actually very handy for when you just want it to output something specific.

GPT-4 might be better but it’s also infinite times the price. ChatGPT 3.5 has a higher hallucination rate. The others are not as good at this specifically.

1 Like

thanks for your replies, good to see codex still has fans :slight_smile:

I really want to keep using it and build things with it, but I have little confidence in doing so due to a lack of communication by OpenAI. For all we know, tomorrow they can pull the plug on Codex and tell you to use chatgpt because it understands your instructions better…

No.

There is close to zero chance OpenAI will pull the plug on their very successful codex model which drives one of their most successful products, GitHub Copilot owned by Microsoft with at least 2M users a month a growing :rocket:

:slight_smile:

1 Like

“pull the plug” can take other forms… like making it a paid product at gpt4 cost… (we don’t expect it to be free forever, right?)

I’m not really much of a “speculator”.

But think about this.

Microsoft owns GitHub Copilot and there are around 2 M monthly users and growing. Microsoft is also the key investor in OpenAI. OpenAI runs on Azure, another Microsoft product.

GitHub Copilot costs $10 USD a month for personal use and business is $19 a month. If around half of 2M monthly users are on a paid plan, then that is at least $10,000,000 USD a month in revenue from Copilot and growing.

So, that means already that codex is not free, because it is a key cost of the GitHub Copilot subscription cost.

Honestly, @nunodonato, I am not really understanding your motivation to post FUD about codex. There is nothing in the marketplace to support your fears about codex and 2M + GitHub Copilot Codex users are not going away.

HTH

:slight_smile:

1 Like

sorry, my intention was not to post FUD or anything like that. I think I was clear in my first post about what my intentions were. I also explained why I felt a lack of motivation in continuing to explore codex due to a lack of communication by OpenAI regarding the codex models.

That is your choice, @nunodonato

I am simply posting some facts about how vibrant the codex user base is after you asked the question:

I guess when you first posted you were not aware of the huge GitHub Copilot community using codex and how successful codex is at code completions for VSC users. After all, code completions are what codex was trained for.

Also, the way GitHub copilot works, it also scans code in your GitHub reports and uses that code for completions; so even though it’s not updated as an LLM, it does have some hooks to work with the code in a users repo.

HTH

:slight_smile:

I am aware of that, but copilot is pretty much an “internal” product. For all we know, that can even be using a codex v4 model or even a variant of gpt4… we have no way to know which one it is.

But fair enough, I’m happy to know people are still using it confidently. I shall go back to it again too :slight_smile:

No. It’s not a “variant of GPT4”, here is the reference quote:

GitHub Copilot is powered by the OpenAI Codex,[10] which is a modified, production version of the Generative Pre-trained Transformer 3 (GPT-3), a language model using deep-learning to produce human-like text.[11] The Codex model is additionally trained on gigabytes of source code in a dozen programming languages.

In addition,

Copilot’s OpenAI Codex is trained on a selection of the English language, public GitHub repositories, and other publicly available source code.[2] This includes a filtered dataset of 159 gigabytes of Python code sourced from 54 million public GitHub repositories.[12]

Open AI’s GPT-3 is licensed exclusively to Microsoft, GitHub’s parent company.[13]

HTH

:slight_smile:

1 Like

We’re still using it because it’s free, though some of our recent testing has shown that using the ChatGPT API with some careful prompting can lead to results that are as accurate or better than Codex for text-to-SQL translation.

thanks for sharing this, quite insightful. Yes, I would guess chatGpt would work best as it is based on the instruct series and text-to-sql is kind of like an instruction. IIRC, the “text to code” sample in the playground also uses davinci-003 instead of codex, for that reason.

But I’m curious if we could get better results by changing the prompt a bit. Can you share the original full prompt?

1 Like

Sure! Here’s the prompt we’ve used the most with Codex:

-- language PostgreSQL
-- schema:
{schema}
-- be sure to properly format and quote identifiers.
-- A postgreSQL query to SELECT 1 and 
-- a syntactically-correct PostgreSQL query to {user_prompt}
SELECT 1;

and with ChatGPT

You are a SQL code translator. Your role is to translate
natural language to PostgreSQL. Your only output should be SQL code.
Do not include any other text. Only SQL code.

Use the following PostgreSQL database schema:

{schema}

Convert the following to syntactically-correct PostgreSQL query: {user_prompt}.

Where, in both cases, {user_prompt} is filled in with the user’s question or instruction for the database, and {schema} is filled in with some high-level details about the database (schema names, table names, column names, column types)

would you be able to give me the full filled-out prompt for the case where codex failed? I’d like to give it a go to get it working :slight_smile: (private msg if necessary)

1 Like

Oh, understood!

-- Language PostgreSQL
-- schema: 
-- Table = "actor", columns = [actor_id integer, first_name character varying, last_name character varying, last_update timestamp without time zone]
-- Table = "address", columns = [address_id integer, address character varying, address2 character varying, district character varying, city_id smallint, postal_code character varying, phone character varying, last_update timestamp without time zone]
-- Table = "category", columns = [category_id integer, name character varying, last_update timestamp without time zone]
-- Table = "city", columns = [city_id integer, city character varying, country_id smallint, last_update timestamp without time zone]
-- Table = "country", columns = [country_id integer, country character varying, last_update timestamp without time zone]
-- Table = "customer", columns = [customer_id integer, store_id smallint, first_name character varying, last_name character varying, email character varying, address_id smallint, activebool boolean, create_date date, last_update timestamp without time zone, active integer]
-- Table = "film", columns = [film_id integer, title character varying, description text, release_year integer, language_id smallint, rental_duration smallint, rental_rate numeric, length smallint, replacement_cost numeric, rating USER-DEFINED, last_update timestamp without time zone, special_features ARRAY, fulltext tsvector]
-- Table = "film_actor", columns = [actor_id smallint, film_id smallint, last_update timestamp without time zone]
-- Table = "film_category", columns = [film_id smallint, category_id smallint, last_update timestamp without time zone]
-- Table = "inventory", columns = [inventory_id integer, film_id smallint, store_id smallint, last_update timestamp without time zone]
-- Table = "language", columns = [language_id integer, name character, last_update timestamp without time zone]
-- Table = "payment", columns = [payment_id integer, customer_id smallint, staff_id smallint, rental_id integer, amount numeric, payment_date timestamp without time zone]
-- Table = "rental", columns = [rental_id integer, rental_date timestamp without time zone, inventory_id integer, customer_id smallint, return_date timestamp without time zone, staff_id smallint, last_update timestamp without time zone]
-- Table = "staff", columns = [staff_id integer, first_name character varying, last_name character varying, address_id smallint, email character varying, store_id smallint, active boolean, username character varying, password character varying, last_update timestamp without time zone, picture bytea]
-- Table = "store", columns = [store_id integer, manager_staff_id smallint, address_id smallint, last_update timestamp without time zone]
-- be sure to properly format and quote identifiers.
-- A postgreSQL query to SELECT 1 and 
-- a syntactically-correct PostgreSQL query to What are the names of all the action films?
SELECT 1;

This template worked for 9/10 prompts in our (very small) test suite but failed for this one, where it returned

SELECT title FROM film WHERE category_id = 1

The codex model in particular is quite sensitive to changes in the prompt. It’s fairly easy to evoke the “right answer” for this prompt, but doing so can mess up some of the others (e.g. removing the admonition to correctly quote identifiers results in two or three of the tests failing).

1 Like

well I could get it to work using a one-shot in the prompt, and making it breakdown the task in two steps :slight_smile:

I’ve gotten it to work with a few variations, but a lot of those variations break other tests in the test suite. My experience overall has been that it’s a little less forgiving/a little more variable than the ChatGPT API.

Can you share the prompt that you ultimately got to work? I’d be interested to see if it works well with some of the other test cases. Thanks!

...
-- Table = "store", columns = [store_id integer, manager_staff_id smallint, address_id smallint, last_update timestamp without time zone]
-- be sure to properly format and quote identifiers.
###
-- Instruction: list all spanish cities
-- Query: select all cities that are from the country spain
SELECT city FROM city WHERE country_id = (SELECT country_id FROM country WHERE country = 'Spain');
###
-- Instruction:: What are the names of all the action films
-- Query: select all films that are in the action category
SELECT title FROM film WHERE film_id IN (SELECT film_id FROM film_category WHERE category_id = (SELECT category_id FROM category WHERE name = 'Action'));

my prompt ended after the second “Query:” so I let it workout the better english description, before the actual query.

let me know how it works out for the other tests :slight_smile:

1 Like