RLHF → fine-tune? Feel like hitting a button 78 thousand times? (or letting GPT-4 first pick by criteria)
We released WizardCoder-15B-V1.0 (trained with 78k evolved code instructions), which surpasses Claude-Plus (+6.8), Bard (+15.3) and InstructCodeT5+ (+22.3) on the HumanEval Benchmarks.
Narrow down single instruction examples like this, on 175b davinci, to a LoRA specialization, and it just might kill gpt-3.5:
“all_code”: "Below is an instruction that describes a task. Write a response that appropriately completes the request.
Instruction:
Create a Python script for this problem:
import math
def poly(xs: list, x: float):
“”"
Evaluates polynomial with coefficients xs at point x.
return xs[0] + xs[1] * x + xs[1] * x^2 +… xs[n] * x^n
“”"
return sum([coeff * math.pow(x, i) for i, coeff in enumerate(xs)])
def find_zero(xs: list):
“”" xs are coefficients of a polynomial.
find_zero find x such that poly(x) = 0.
find_zero returns only only zero point, even if there are many.
Moreover, find_zero only takes list xs having even number of coefficients
and largest non zero coefficient as it guarantees
a solution.
>>> round(find_zero([1, 2]), 2) # f(x) = 1 + 2x
-0.5
>>> round(find_zero([-6, 11, -6, 1]), 2) # (x - 1) * (x - 2) * (x - 3) = -6 + 11x - 6x^2 + x^3
1.0
“”"
Response:The script for the problem is as follows:
import math
def poly(xs: list, x: float):
"""
Evaluates polynomial with coefficients xs at point x.
return xs[0] + xs[1] * x + xs[1] * x^2 +.... xs[n] * x^n
"""
return sum([coeff * math.pow(x, i) for i, coeff in enumerate(xs)])
def find_zero(xs: list):
""" xs are coefficients of a polynomial.
find_zero find x such that poly(x) = 0.
find_zero returns only only zero point, even if there are many.
Moreover, find_zero only takes list xs having even number of coefficients
and largest non zero coefficient as it guarantees
a solution.
>>> round(find_zero([1, 2]), 2) # f(x) = 1 + 2x
-0.5
>>> round(find_zero([-6, 11, -6, 1]), 2) # (x - 1) * (x - 2) * (x - 3) = -6 + 11x - 6x^2 + x^3
1.0
"""
if len(xs) % 2!= 0:
raise ValueError("Length of xs should be even")
max_coeff = max(abs(coeff) for coeff in xs)
xs = [coeff / max_coeff for coeff in xs]
for i in range(len(xs) // 2):
if xs[i]!= 0:
break
xs = xs[i:] + xs[:i]
for i in range(len(xs) // 2):
if xs[i]!= 0:
break
xs = xs[i:] + xs[:i]
a = xs[0]
b = xs[1]
c = xs[2]
d = xs[3]
e = xs[4]
f = xs[5]
g = xs[6]
h = xs[7]
if a == 0:
if b == 0:
if c == 0:
if d == 0:
if e == 0:
if f == 0:
if g == 0:
if h == 0:
raise ValueError("Polynomial is constant")
else:
return -h / 8
else:
return -g / 2
else:
return -f / 4
else:
return -e / 6
else:
return -d / 8
else:
return -c / 10
else:
return -b / 12
else:
return -a / 14
The poly
function takes a list of coefficients xs
and a point x
and returns the value of the polynomial at that point. The find_zero
function takes a list of coefficients xs
and returns the value of x
such that the polynomial poly(x)
is equal to zero. The function first checks if the length of xs
is even, and raises a ValueError
if it is not. It then normalizes the coefficients by dividing them by the maximum absolute value, so that the largest non-zero coefficient is 1. It then rearranges the coefficients so that the first non-zero coefficient is at the beginning of the list. It then uses the coefficients to find the value of x
using the method of Horner’s rule. The function returns the value of x
as a float."