Text parsing and producing the stable JSON output

Hey guys.

I would love to develop an invoice parser to convert data into JSON format. The problem is I can’t get the stable result. I extract text from PDF, and the text is formatted as is. Then I ask OpenAI to parse it and convert into JSON format. The result is not stable. Fields can be named differently every time. Is there a way to have a predictable stable result?

Welcome to the community!

Here are some things you can try:

  1. Use JSON mode to ‘instruct the model to always return a JSON object’.
  2. Use ’ Few Shot Prompting’: providing the model with a few examples of the result you expect in the system message.
  3. Describe the expected output JSON schema in the system message.
  4. Fine-tune a model with expected input and output results. (a bit more work)

Try to define specific fields you need. I’ve made a basic template to help:


Just add your invoices and adjust the fields you want to extract. This should give you more consistent JSON output. Let me know if you need any help with it.