Vision Model Fine-tuning Query

babbigrewal123 · January 30, 2025, 6:50pm

I am fine-tuning geometrical images for interpretation that need to be returned in JSON form in this manner

{
“ShapeAnalysis”: {
“shapes”: [
{
“type”: “Lorem Ipsum”, #My shape names will be custom. A rectangle might be called “Content placeholder” for eg
“parameters”: {
“orientation” : horizontal
“text”: “A1. Lorem Ipsum…”, # random text

    }
  },
  {
    "type": "SpecialShape2",
    "parameters": {
      "Steps": 10
      "text": "Lorem ipsum",
      "type": Underline/Box/Lorem ipsum 
    }
  }
]

}
}

The thing is I have shapes that a normal ChatGPT would definitely be able to interpret it in its own words but I need it in this format.

Each image can have 5-10 such shapes. Each shape can have 2-3 parameters like text and whatever is relevant for that shape. There are 300 such shapes.
I’m thinking of fine-tuning it on 1 million rows trying to keep it as balanced as I can

Will this work? Is the data too less, is the task too complex? Is this usecase even relevant for fine-tuning?

I would love any helpful insights:

How much data is too less, how much is enough, how much is an overkill for my usecase. Is my usecase too complex for the vision model to learn?

Is there anything else I can optimize for?

Thanks

Topic		Replies	Views
Seeking Advice: Enhancing Accuracy of GPT-4 with Vision API gpt-4 , api , adv-data-analytics , gpt-4-vision , gpt4-vision	5	2881	May 15, 2024
Large context document and finetuning API gpt-4 , fine-tuning	1	203	September 6, 2024
Fine-tuning gpt-4o-2024-08-06 with images? API fine-tuning	2	1436	October 3, 2024
How Does the GPT-4V API deal with large Images? API gpt-4 , gpt-4-vision	0	1214	January 22, 2024
Fine-tuning model with JSON-schema API	0	279	September 23, 2024

Vision Model Fine-tuning Query

Related topics