Fine tunning with OCR (Assistant)

celia.stelorder · October 23, 2024, 2:24pm

Good afternoon! I am developing an OCR with the assistants (API), I want to do fine tuning to refine the response when setting taxes, rounding… etc.

I saw the documentation but it is not clear to me how I give examples of bad invoices and good invoices for anyone to see, since what fine tunning does is respond in such a way if they ask you such a thing,

What would the JSONL format be like? thx so much !

anon10827405 · October 23, 2024, 3:20pm

I wouldn’t recommend trying to fine-tune the vision model to perform work on the numbers in the document. Try to focus on a single concern for each pass.

Your best bet is to extract the relevant information and transform it into a usable format (like structured output) and then use typical programming to perform any deterministic work on the fields.

darcschnider · October 23, 2024, 3:25pm

The question about fine-tuning with OCR in relation to invoices can be approached by providing examples that include both “bad” and “good” invoices in a JSONL format or instruction format. Since you’re dealing with text extracted via OCR, the goal is to tune the model (I am not talking about using a fine tuned model but to tune your logic deciders) to understand and respond better to specific details, such as rounding, setting taxes, and handling specific invoice formats.

Here’s how you can structure your fine-tuning examples in a JSONL file. For this scenario, the format would include prompts (your OCR input, including mistakes or unclear text) and completions (the correct, expected output after fine-tuning):

Example JSONL format you could use to help validate your structures.

jsonl

{"prompt": "Invoice Total: $100.555 Tax: $5.25 Rounded Total: $105.80", "completion": "Invoice Total: $100.56 Tax: $5.25 Rounded Total: $105.81"}
{"prompt": "Invoice Total: $150.00 Tax: 0.10% Rounded Total: $150.000001", "completion": "Invoice Total: $150.00 Tax: 0.10% Rounded Total: $150.00"}
{"prompt": "Taxable Amount: $1000.45 Tax: $75.50 Total: $1075.9500000001", "completion": "Taxable Amount: $1000.45 Tax: $75.50 Total: $1075.95"}

Key Steps:

OCR Inputs (Prompt): These are the outputs you get from your OCR that may include small issues like extra decimal places or incorrect rounding.
Correct Output (Completion): This is the refined response you want from your model after fine tuning, where the values are correctly rounded or the tax is appropriately applied.

Important Notes:

Prompts: Should reflect the common mistakes that OCR might produce from invoices, such as improper rounding or decimal precision.
Completions: Should reflect the corrected and refined output you expect after applying the taxes or rounding rules.

How to Build More Data:

You can add more examples of “bad” and “good” invoices following this format. The model will learn from these examples and apply similar corrections in future interactions.

Make sure to structure each example as a JSONL line like the ones above, where the prompt is the OCR’s raw output, and the completion is the expected, corrected invoice. Fine tuning will help your assistant recognize and apply consistent rules for invoices after OCR processing.

Additionally you could just structure all OCR into a JSON format and create a filter stack that checks the structure and contents for each with understanding of what is expected in each and if there is something incorrect to flag it.

I did an invoice tax processor this way that looked for Taxes on things that were not to be taxed. it checked customers and invoices to match the ones that should be exempt.

*in a nut shell seeing this is not that clear because of the use of terms.

if you present a layered approach of ai’s that look at key sections of the JSON they can identify missing information, extra information or format issues. You can use validation ai’s as a back up filter to validate the work of the deciders. This will ensure your hit rates are much higher. if you build your stack robust enough you will get accuracy results. using Ai deciders rather than regular logic gives you that flexibility to spot the issues more dynamic in case you miss something there is a higher chance the ai will see it.

hope that helps clarify this some more.

anon10827405 · October 23, 2024, 3:30pm

This is completely hallucinated information.

I highly recommend not doing this.

There is absolutely no benefit and only massive disadvantages to fine-tuning a model to do something as basic as rounding numbers.

sps · October 23, 2024, 3:44pm

@celia.stelorder

Are you looking for vision fine-tuning where you extract data from invoice images?

darcschnider · October 23, 2024, 4:04pm

What I said works, the term fine tune I am using is not the same as fine tuning a model. it’s about ai stack deciders. it works perfectly fine with validators. I don’t use any of the fine tune for unstructured data, but build stacks.

There are many ways to do this. you can check out my kruel.ai project which works very well. Hallucinations imo are created because of poor instructions, validators, bad data handling, and temperatures.

Models also play a huge role in understanding. if you use a local model that is 3B for example you may need a lot more instructions and validation. where a model like GPT4o etc work much better and require less stacks to ensure outcomes are correct.

updated my previous post to make this clear that its not fine tuning the model. but the logic stack.

celia.stelorder · October 24, 2024, 6:38am

Hello! The thing is that the assistant is currently giving me problems when it comes to filing taxes, sometimes he doesn’t understand them and doesn’t finish deducting them correctly, I wanted something like giving the fine tuning a good/bad invoice with examples so that he can see how to deal with it, but I’m not sure how it will work

celia.stelorder · October 24, 2024, 6:40am

But if it’s not fine tuning as such, where do you put it? a json file in the assistant search file? It’s something I’m not very clear about.

celia.stelorder · October 24, 2024, 6:42am

TThe problem is not the rounding of decimals, it is that sometimes you do not understand types of taxes and apply them incorrectly and I want you to learn from what goes wrong

zakirelkheir · October 24, 2024, 9:19am

Iam not sure that I understood the situation 100% correctly, but Iam assuming the following:

you have already OCRed the invoices with high accuracy ( specially numbers ) , or you use some software or service for doing that before sending the request ( the model receive only text input ) .
Iam not familiar with how taxes work in your country , and hence didn‘t fully understand the term “applying taxes“ , so I am assuming that you want to perform some arithmetic operation depending on the types (or another attributes ) of the invoices.

Based on this understanding I can the suggest the following :

Assistent API is used manly for handling multi-term Conversations from multi-user setup with ease . It also provide tools like file_search ( for creating , searching knowledge base ) and analytic_tool (for executing python code ). But for your use case , I don‘t think your gonna need any of that . And hence you can make your code much simple by using chat_completion API.
using Structured Output . You can use the structured output mode when making request to open ai and provide it the desired response schema alongside with OCR invoice . Then chatgpt will extract the desired information from the invoices ( like amount , tax_type …etc ) , and send them back in a programatically accessible and predictable ( 100% conform to your desired schema ) JSON Format.
Alternative to Step 2 : you can also use Function Calling which work very similar to Structured output.
After that you can write your own logic for applying taxes ( whatever that mean ) and rounding numbers in any programming language of your choice .

Here is a Pseudo Code in python [generated with chatgpt 4o]


# Function to send OCRed invoice to OpenAI API
def call_openai_api(ocred_invoice_text):
    # Pseudo API call to OpenAI (returns structured invoice details)
    # The structured response may have fields like 'type', 'items', 'tax_rate', 'is_exempt'
    # Assuming this returns a structured dictionary-like output.
    
    openai_response = {
        "type": "service_invoice",   # Example of invoice type (could be 'product_invoice', etc.)
        "items": [
            {"description": "Consulting Service", "price": 1000},
            {"description": "Software License", "price": 500}
        ],
        "tax_rate": 0.15,  # 15% tax rate (as an example)
        "is_exempt": False # Is the invoice exempt from taxes?
    }

    return openai_response

# Function to apply taxes and rounding based on structured output from OpenAI
def process_invoice(ocred_invoice_text):
    # Step 1: Call OpenAI API to get structured invoice data
    structured_invoice = call_openai_api(ocred_invoice_text)

    # Step 2: Initialize total price
    total_price = 0

    # Step 3: Calculate total price before tax
    for item in structured_invoice["items"]:
        total_price += item["price"]

    # Step 4: Apply tax based on the invoice type and other attributes
    if structured_invoice["is_exempt"]:
        # If the invoice is exempt from taxes, no tax is added
        tax_amount = 0
        print("This invoice is tax-exempt.")
    else:
        # Apply tax if the invoice is not exempt
        tax_rate = structured_invoice["tax_rate"]
        tax_amount = total_price * tax_rate
        print(f"Applying tax of {tax_rate*100}% to the invoice.")

    # Step 5: Add the tax amount to the total
    final_total = total_price + tax_amount

    # Step 6: Round the final total to two decimal places (for currency)
    final_total = round(final_total, 2)

    # Step 7: Display the final result
    print(f"Total before tax: {total_price}")
    print(f"Tax amount: {tax_amount}")
    print(f"Final total after tax (rounded): {final_total}")

    return final_total

# Example OCRed invoice text (this would be the result of an OCR scan)
ocred_invoice_text = "Sample OCRed invoice text goes here"

# Process the invoice and print the result
final_invoice_total = process_invoice(ocred_invoice_text)

arata · October 24, 2024, 11:50am

This is frankly, a terrible idea to attempt using a language AI alone. You will not be able to improve its literacy in math that $100 million of training on massive data did not already instill.

What you should train it on: using an external calculator function that can perform real math, before giving any answers to users.

With vision, it is a good idea to have the AI discuss what is seen in the image before any further analysis. You can have this be an initial property field which the function accepts before the actual calculator input field. That would get a reliable record of the vision OCR extraction, which the AI can use to understand the calculation needing to be sent.

Hope this advice steers you in the right direction - you might not need fine-tune at all.

darcschnider · October 24, 2024, 11:51am

This is how I would do it, not using the ai itself to do the rounding.

Parse the JSON data** to access the relevant fields.
Search for the number** that matches the specific format you need (for example, invoice numbers, transaction IDs, etc.).
Return or perform an action based on whether the number is found and correctly formatted.

Let’s assume your invoice data looks something like this in JSON:

json

Copy code

{
  "invoices": [
    {
      "invoice_number": "INV-12345",
      "date": "2024-10-24",
      "total": 1050.00,
      "items": [
        {
          "description": "Item 1",
          "quantity": 10,
          "price": 50
        }
      ]
    },
    {
      "invoice_number": "INV-67890",
      "date": "2024-10-24",
      "total": 500.00,
      "items": [
        {
          "description": "Item 2",
          "quantity": 5,
          "price": 100
        }
      ]
    }
  ]
}

Let’s say the format you are checking for is INV-XXXXX (where XXXXX is a number).

Sample Python function:

python

import re
import json

# Example function to check for a specific number format in the JSON output
def check_invoice_number(json_data, number_format="INV-\d{5}"):
    # Parse the JSON data (if it's a JSON string)
    if isinstance(json_data, str):
        data = json.loads(json_data)
    else:
        data = json_data

    # Compile the regular expression for the specific number format
    pattern = re.compile(number_format)

    # Iterate over the invoices and check the format of the invoice numbers
    for invoice in data.get("invoices", []):
        invoice_number = invoice.get("invoice_number", "")
        if pattern.match(invoice_number):
            print(f"Invoice number {invoice_number} is valid.")
        else:
            print(f"Invoice number {invoice_number} is invalid.")

# Example usage
json_data = {
    "invoices": [
        {
            "invoice_number": "INV-12345",
            "date": "2024-10-24",
            "total": 1050.00,
            "items": [
                {
                    "description": "Item 1",
                    "quantity": 10,
                    "price": 50
                }
            ]
        },
        {
            "invoice_number": "INV-ABC12",
            "date": "2024-10-24",
            "total": 500.00,
            "items": [
                {
                    "description": "Item 2",
                    "quantity": 5,
                    "price": 100
                }
            ]
        }
    ]
}

# Call the function to check invoice numbers
check_invoice_number(json_data)

Regex Pattern: INV-\d{5} checks for the format where “INV-” is followed by exactly 5 digits.
Pattern Matching: For each invoice in the JSON, the function checks whether the invoice_number matches the pattern.
JSON Handling: The function accepts either a JSON string or a parsed JSON object.
Action: It prints whether each invoice number is valid or not, but you can customize this (e.g., raise an error or return a list of valid/invalid numbers).

Output:

Invoice number INV-12345 is valid.
Invoice number INV-ABC12 is invalid.

This approach allows you to search for numbers that follow a specific format and handle your OCR data systematically. You can easily adjust the regex pattern for other formats if needed.

The OCR data you would have to use your JSON formatting or Ai to format it for you. than build in the logic to pass each format through looking for the part that you want to check and run a rounding logic depending on your needs. you could than use validation Ai or other to check if its now valid.

In the previous I thought you were just looking for one that were valid and not valid only to flag them, so difference would be adding logic in the middle of your processing to do the rounding. Ai itself is not the best at math but it can call functions you create with understanding.

Hope that makes more sense. its more about breaking the data down into a format to process.

This is just to show a method, not your exact but you should get the idea from it.

Now if your numbers are not always 5 digits you would have to build a more complex handler.

celia.stelorder · October 24, 2024, 12:46pm

I think I did not explain myself very well, for example an invoice for 3% VAT, sometimes depending on how it is specified, the unit price of the product is confused and applies those taxes incorrectly, that is, it returns an incorrect invoice since it applied the taxes incorrectly and sometimes even the unit price depending on what the invoice is, that’s what I mean, training you with a bad invoice/good invoice, thx for answering!!

anon10827405 · October 24, 2024, 2:00pm

This is really confusing.

What kind of invoices are you processing that don’t already have the taxes and totals calculated?

In any case your first task would be to transform the document into a structured text, and THEN start to work on the document.

It’s critical to remember: You do not want a vision model to be OCRing your document AS WELL as performing numerical work on it.

This. Won’t. Work.

Or, it will kind of work but will be brutally inefficient and prone to constant edge cases.

brettsnodgrass · October 27, 2024, 8:50am

GPT was inversely accurate based on the length of the text that I wanted to OCR. My source was German articles from the early 1900s. The GPT sometimes changed the text to reflect current grammar, so I had to make the GPT proofread its output.

crownraki2010 · October 29, 2024, 2:29am

Why can’t you add validation to logic for both good and bad invoices formats?. Please find below format
Your json structure would be:
invoice_data = {

    "Inovices": [{
                    "Type": "Good",
                    "date": "2024-10-29",
                     "total": 1050.00,
      "entries": [
                             {
                              "description": "Test Item",
                              "quantity": 10,
                              "price": 50
                       }
              ]
        },

}
{
“Type”: “Bad”,
“date”: “2024-10-29”,
“total”: 10500.00,
“entries”: [
{
“description”: “Reso Item”,
“quantity”: 10,
“price”: 500
}
]
},
you can parse the josn with json module and do further processing. Reach out me on
for any help and support. Happy to help you

Topic		Replies	Views
Using gpt4o as OCR fills data with invented data API gpt-4 , gpt4o , ocr	10	467	December 20, 2024
Options preparing data for finetuning 2 Prompting api	19	229	December 12, 2024
How to create a correct JSONL for training Prompting gpt-35-turbo , api	14	12343	February 27, 2025
Using the Vision API: best practices API api , gpt-4-vision	10	2078	September 26, 2024
How to Fine-Tune a LLM for Function Calling and Conversational Style Simultaneously? API fine-tuning , gpt-4o-mini	0	171	April 16, 2025

Fine tunning with OCR (Assistant)

Example JSONL format you could use to help validate your structures.

Key Steps:

Important Notes:

How to Build More Data:

Related topics