Fine-tuning a model without using prompt-completion

Hello! I am developing software to perform web test automation. The way it works is that you use a graphical interface to compose the test, concatenating several atomic operations (such as “Click”, “Scroll” or “Text typing”), and saving the test produces a JSON file containing a test definition, always structured according to a specific (arbitrary) schema I decided to use.

Each atomic operation has inputs and outputs: for example, the “Click” operation has an input which is merely a string providing an XPath expression pointing to the element to click on.
There are many atomic operations. Every one of them is well-documented according to a specific standard. Specifically, we provide a description of the operation (what does it actually do), a list of inputs and outputs that specific operation can be associated with, and for each of those there’s a description as well. Something like:

"clickOperation": {
    "inputs": [
          "xpath": {
                "type": string,
                "description": "XPath expression pointing to the element to be clicked on"
    "outputs": [
        "next": {
              "type": "pointer",
              "description": "Next operation to perform"

I would like to develop a chatbot module for my software using OpenAI APIs, so that I can give the software natural language input and receive a test definition in output. The whole operation documentation is about 10k tokens, and I would like to make use of it.

However, I think that the documentation alone is not sufficient for the model to learn how to create test definitions, so I thought about providing some test examples via fine-tuning, something like {"prompt": "Define a test that opens Google and clicks on the button with XPath '/some/xpath', "completion": {"..."}}. This would take care of the “training examples” part.

But, how should I provide the documentation? For example, how can I provide the documentation above for fine-tuning? Since it’s not an example of how the model should behave, but rather just a corpus of additional information for the model to better interpret the training examples, I don’t think it’s fit for a prompt-completion pair.
Should I go for a “hybrid” approach? Use embeddings to learn the documentation, and then fine-tuning to train the model to create well-formed test definitions?

Perhaps you could fine-tune a model on examples of generating the test definition from natural language, then run the results through a ChatGPT completion step to check it that asks the question “Do these results conform to the documentation? If not, correct it.”

Doing both steps could lead to much higher quality results.

Generally, you can also improve your fine-tune results and accuracy by adding more training examples.

1 Like