Using fine-tuning for operational report generation

Hey there,
for a few days I’m trying to develop an application that will generate operational reports based on some specific parameters.

Therefore I scraped about 400 press reports from a public accessible page and made a completion request with text-davinci-003 to extract the object with the specific parameters as JSON Object. So far so good - in my opinion this worked better than expected.

BUT then I removed some really dirty reports from the training data and collected the left press reports.
I generated a prompt similar like that for every press report:


Press Report:

I combined the prompts and the completions (scraped press reports) to a jsonl file and created a fine-tuning (davinci, 2 epochs) with this file.

When I then tested the model the output is totally crap, often it repeats sentences endlessly (after I increased frequency penalty this stopped) and the worst thing that it hallucinates a lot even with 0 temperature. It writes instructions for the use of photos of a press report and other things. This comes as the training data contains some sections that couldn’t be derived by the input parameters.

So my final question: Do you think fine-tuning is the correctly/best chosen option or should I use simple completion? And if fine-tuning is the best option, how should I improve training?