Text parsing and producing the stable JSON output

Hey guys.

I would love to develop an invoice parser to convert data into JSON format. The problem is I can’t get the stable result. I extract text from PDF, and the text is formatted as is. Then I ask OpenAI to parse it and convert into JSON format. The result is not stable. Fields can be named differently every time. Is there a way to have a predictable stable result?

Welcome to the community!

Here are some things you can try:

  1. Use JSON mode to ‘instruct the model to always return a JSON object’.
  2. Use ’ Few Shot Prompting’: providing the model with a few examples of the result you expect in the system message.
  3. Describe the expected output JSON schema in the system message.
  4. Fine-tune a model with expected input and output results. (a bit more work)

Try to define specific fields you need. I’ve made a basic template to help:

https://tune.ac/project/01J1YDBZG2YZPBG2Y8AH7Z431M

Just add your invoices and adjust the fields you want to extract. This should give you more consistent JSON output. Let me know if you need any help with it.