That’s a whole other subject. I’ll expand this over the weekend. But to give you some background I often use:
People often think large language models are somewhat “smart”, but it is better to assume they are just too good at tricking it. We believe they can get input in, “think it over,” and produce “thoughtful” answers (completion). I think models are a sort of conditional reflex (on steroids, for sure) - they see “this” as input, and they guess the best output would be “that”. Sure, they have internal multi-step “thinking” inside, but at the level of abstraction we can operate as of today, that’s too detailed to be considered.
While thinking, humans operate with concepts interconnected into thoughts. And our process of thinking is a chain (or rather a tree) of thoughts that follow some rules. So if you want to make your app “think,” - you’ll have to give it the things it needs:
- Background knowledge (your embedded facts)
- Initial thought (your request)
But then you need to help it extract and understand the concepts in the background facts and the initial thought (analyze the request/facts, extract concepts and their relationships, understand the query’s intent, see patterns in the knowledge, etc.) and show the whole “processing” tree/chain/logic to get the solution (just like kids).
Then based on this understanding, a model (out of many) will be able to do one step in your “thinking” process (they are good at doing one step).
But do not ask it to do the whole thing. You’ll have to create “thinking patterns” that lead to the final result you want and teach your app (a tribe of models in this case) to do one step at a time and send the “business” to the next step (sending to the next step often is just a block of code in your app).
To sum up the idea, you need to thoroughly analyze the ways (patterns) you solve those requests (by type), write down the steps, and input/output at each of those, and it will give you an idea of what your core engine should do. Then start with one pattern at a time and train models for each step with what their input/output should be (and what to do if the model fails). Many steps do not require “models”, just good old-style code.
As for question/answer training, I would replace “question” with “request” to help better understanding. And the requests might look like this:
Task: task description.
Background information: all your needed facts.
Metadata (whatever label fits): stuff the model needs to know about the request.
Model’s previous answers to similar requests: (if fit, a couple of examples)
Previous conversation: short summary of what happened in the chat.
User inquiry: the user input (filtered and sanitized, of course)
User intent: (from one of the previous steps)
Model’s answer (or possible answers):<|endoftext|><|reply|>
Model replies in training data should start with white space and (little tip from me, not necessary but useful for quick tests) end with <|endofreply|><|endoftext|>
In API calls, use <|endoftext|> as a stop sequence and check replies completeness with regex on <|endofreply|> at the end. Or you can check the stop reason in API response (better in production).