Fine-tuning GPT model for extracting fields from Hebrew invoices

I am currently working on fine-tuning a GPT model to extract specific fields from Hebrew invoices. To do this, I am using an OCR service to retrieve the invoice text and then passing it to the model along with a JSON file containing the expected valid fields as the result.

Although I have trained the model using 1500 examples, the results are not as accurate as I had hoped. However, I have discovered that the GPT-3.5-turbo model provides much more accurate results with just a simple prompt and one example.

I’m wondering if I’m doing something wrong with my approach or if I need to train the model using more examples to improve its accuracy. Can anyone offer any advice or suggestions? Any help would be greatly appreciated.

Thank you in advance!

Which models did you train for fine-tuning? Did you use davinci?

1,500 examples should be enough data to get 90%+ accuracy.

Did you find a solution for this? How did you generate the training JSON file? I mean, did you create it manually or through an automated process?