Which gpt model(API) is used to finetune for image data (read context from the image data)

Hello, I want to train the model for reading text data from image using chatgpt api. I want to know which model may I use for this.

This is not currently possible with the released models for fine-tuning.

Image content extraction done by OCR integration with gtp model. In this I want to know, how to make dataset for this use case. Is there any documentations for that?

No, because you cannot fine-tune a model to do this currently so there would be zero value in producing documentation describing how to curate a dataset to do this. Because, again, you cannot do this

