Multiple examples form best practices don't work

If I put this into playground (no matter if you choose Plain Text oder SQL as language):

SELECT DISTINCT department.name
FROM department
JOIN employee ON department.id = employee.department_id
JOIN salary_payments ON employee.id = salary_payments.employee_id
WHERE salary_payments.date BETWEEN '2020-06-01' AND '2020-06-30'
GROUP BY department.name
HAVING COUNT(employee.id) > 10;
-- Explanation of the above query in human readable format
--

code-davinci-002 completes this way:

-- SELECT DISTINCT department.name
-- FROM department
-- JOIN employee ON department.id = employee.department_id
-- JOIN salary_payments ON employee.id = salary_payments.employee_id
-- WHERE salary_payments.date BETWEEN '2020-06-01' AND '2020-06-30'
-- GROUP BY department.name
-- HAVING COUNT(employee.id) > 10;

while text-davinci-002 gives the correct answer:

-- This query is selecting the names of each department that had
-- more than 10 employees receive salary payments in the month of June 2020.

Why is Codex worse that GPT-3 here?

Or am I using this wrong?