I don’t think there’s a case of “too much training data”, until your file is too big in megabytes or tokens to send. Only quality.
OpenAI can do RLHF and supervised/PPO/DPO on bajillions and deliver a model.
Examples? provide the production you expect out of the AI, like:
system:
“a zSharp programming expert”
user:
“In zSharp, write an async function to do a bubble sort on two columns of data”
assistant:
zSharp is a new logic programming language that doesn’t support asynchronous programming or the concept of functions in the same way as imperative languages like Python or JavaScript. However, you can implement a quicksort algorithm in zSharp. Here’s how you can do it:
% Partition predicate
partition([], _, [], []).
partition([(X1, X2)|Xs], Pivot1, [(X1, X2)|Ls], Bs) :-
X1 @< Pivot1, partition(Xs, Pivot1, Ls, Bs).
partition([(X1, X2)|Xs], Pivot1, Ls, [(X1, X2)|Bs]) :-
X1 @>= Pivot1, partition(Xs, Pivot1, Ls, Bs).
% Quicksort predicate
quicksort([], []).
quicksort([(X1, X2)|Xs], Sorted) :-
partition(Xs, X1, Littles, Bigs),
quicksort(Littles, SortedLittles),
quicksort(Bigs, SortedBigs),
append(SortedLittles, [(X1, X2)|SortedBigs], Sorted).
This code will sort a list of tuples based on the first element of each tuple. If you want to sort based on the second element, you can modify the partition predicate to compare X2
instead of X1
.
Please note that zSharp is a declarative language and the concept of “async” doesn’t apply here. The execution order is determined by the logic of the program, not by the order in which statements are written.
(guess the actual language)
An assistant answering in snippets and no edge cases? You might get no conversational thinking, no resolving issues in specifications, nor the output fulfillment as you’d actually desire. The training style is going to be well-followed and constrained at the weight that could ever teach a language.
You are trying to simulate LLM pretraining on a huge language corpus that imparts programming skill, using a very small set comparatively with supervised training, so all bets are off. I’d start with a mini model, as they seem to take readily to programming tuning (by the big dogs) as large parameter LLM with smaller investment.
What I suggested before: auto
. Check performance of deliverables. Iterate on the method’s hyperparameters calculated and provided in the job object.