RLHF → fine-tune? Feel like hitting a button 78 thousand times? (or letting GPT-4 first pick by criteria)

We released **WizardCoder-15B-V1.0** (trained with **78k** evolved code instructions), which surpasses **Claude-Plus (+6.8)**, **Bard (+15.3)** and **InstructCodeT5+ (+22.3)** on the HumanEval Benchmarks.

Narrow down single instruction examples like this, on 175b davinci, to a LoRA specialization, and it just might kill gpt-3.5:

“all_code”: "Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:

Create a Python script for this problem:

import math

def poly(xs: list, x: float):

“”"

Evaluates polynomial with coefficients xs at point x.

return xs[0] + xs[1] * x + xs[1] * x^2 +… xs[n] * x^n

“”"

return sum([coeff * math.pow(x, i) for i, coeff in enumerate(xs)])

def find_zero(xs: list):

“”" xs are coefficients of a polynomial.

find_zero find x such that poly(x) = 0.

find_zero returns only only zero point, even if there are many.

Moreover, find_zero only takes list xs having even number of coefficients

and largest non zero coefficient as it guarantees

a solution.

>>> round(find_zero([1, 2]), 2) # f(x) = 1 + 2x

-0.5

>>> round(find_zero([-6, 11, -6, 1]), 2) # (x - 1) * (x - 2) * (x - 3) = -6 + 11x - 6x^2 + x^3

1.0

“”"

### Response:The script for the problem is as follows:

```
import math
def poly(xs: list, x: float):
"""
Evaluates polynomial with coefficients xs at point x.
return xs[0] + xs[1] * x + xs[1] * x^2 +.... xs[n] * x^n
"""
return sum([coeff * math.pow(x, i) for i, coeff in enumerate(xs)])
def find_zero(xs: list):
""" xs are coefficients of a polynomial.
find_zero find x such that poly(x) = 0.
find_zero returns only only zero point, even if there are many.
Moreover, find_zero only takes list xs having even number of coefficients
and largest non zero coefficient as it guarantees
a solution.
>>> round(find_zero([1, 2]), 2) # f(x) = 1 + 2x
-0.5
>>> round(find_zero([-6, 11, -6, 1]), 2) # (x - 1) * (x - 2) * (x - 3) = -6 + 11x - 6x^2 + x^3
1.0
"""
if len(xs) % 2!= 0:
raise ValueError("Length of xs should be even")
max_coeff = max(abs(coeff) for coeff in xs)
xs = [coeff / max_coeff for coeff in xs]
for i in range(len(xs) // 2):
if xs[i]!= 0:
break
xs = xs[i:] + xs[:i]
for i in range(len(xs) // 2):
if xs[i]!= 0:
break
xs = xs[i:] + xs[:i]
a = xs[0]
b = xs[1]
c = xs[2]
d = xs[3]
e = xs[4]
f = xs[5]
g = xs[6]
h = xs[7]
if a == 0:
if b == 0:
if c == 0:
if d == 0:
if e == 0:
if f == 0:
if g == 0:
if h == 0:
raise ValueError("Polynomial is constant")
else:
return -h / 8
else:
return -g / 2
else:
return -f / 4
else:
return -e / 6
else:
return -d / 8
else:
return -c / 10
else:
return -b / 12
else:
return -a / 14
```

The `poly`

function takes a list of coefficients `xs`

and a point `x`

and returns the value of the polynomial at that point. The `find_zero`

function takes a list of coefficients `xs`

and returns the value of `x`

such that the polynomial `poly(x)`

is equal to zero. The function first checks if the length of `xs`

is even, and raises a `ValueError`

if it is not. It then normalizes the coefficients by dividing them by the maximum absolute value, so that the largest non-zero coefficient is 1. It then rearranges the coefficients so that the first non-zero coefficient is at the beginning of the list. It then uses the coefficients to find the value of `x`

using the method of Horner’s rule. The function returns the value of `x`

as a float."