Generating Text using a pre-defined project


I’m currently working on a project to generate text by using a pre-defined heuristic on set of messages.
The data was initially in the form of a xlsx file, its most important features are -

MessageType: An original message refers to a message that needs to be heuristicized. A heuristicized message refers to a message that has been changed from original upon applying a particular heuristic to it.

HeuristicName: The name of the heuristic applied to the original message.

ManualMessageScore: A score from 1-3 that describes the quality of the heuristicized message, 1 being highest and 3 being lowest. This is set to NaN for an original message.

After preprocessing, I matched heuristicized messages with their respective original ones and created a dataframe that has 2 columns - a “prompt” column that consists of the original messages, and a “completion” column that consists of their heuristicized counterparts.
I then filtered the dataframe so that all heuristicized messages had been altered according a single specific heuristic.

In this case, there will be repetition since an original message may have several heuristicized messages and vice versa.

I had a couple of questions regarding the problem:

  1. Is there a feasible solution to this problem using GPT-3? What model would be recommended the most?
  2. I understand that the data must be in a JSONL format in order to fit it into a fine-tuned model. I’m having trouble with doing this, any help would be appreciated.
  3. Is there a way to incorporate the features from the initial dataset into a fine-tuned model? I’ve only seen examples that incorporate the ‘prompt-completion’ structure and was wondering if this functionality exists in the model.

Let me know if i can make the problem statement clearer in any way, thanks in advance!