HOW to extract data from a graph image with ChatGpt4

Hello,

I am new to chatgpt but I found it very nice to extract some information from images. Then I decided that it should be possible to extract data from a graph. From what I saw, everything is here to make it possible but the system always fails at a given moment (never the same moment) and never finishes the action.

Here is the prompt I submitted to extract the information.
“My objective is to extract the data from a graph in form of an image.Here is an image representing a graph with two curves. one curve is made of blue triangles markers connected with a blue dash line, the second one is a curve made of red circles markers connected with a red dash line. These curves represent the log(sigmaT) where sigma is the conductivity and T the temperature in kelvin, as a function of 1000/ T. First, can you identify the two compounds that correspond to these two curves. Their composition is indicated in the title of the image and in the legend with different x values for barium content. Then make me validate before going on. Second, can you identify the blue triangles markers and red circles markers in the image. Each triangle or circle should contain several hundreds of pixels. So use this information to remove small clusters. Remove also the markers that contain empty regions or that have a shape of a letter or a number. Then show me on the image the centers of those clusters consisting of triangle and circles. make me validate before going on. Finally, extract the values associated with the blue triangles markers and the red circles markers and put them in a table where the first column is the x position in th 1000/T scale, where the y position is the log(sigmaT) value. Then make a graph with the extracted value similar to the original one. put the graph close to each other to allow for a comparison.”

It detects quite correctly that there are two curves, one made of blue triangles and one made of red circles and what they correspond to, but it never succeed in performing the process the good way, it fails very often. I have spent 15 days on this without getting my final objective that is simply to recover the data of the curves. The point also is that the extraction procedure seems to vary from an attempt to another, and chatgpt misses important information in the prompt.

2 Likes

I don’t think the tech is there yet to be able to achieve what you want. Even a game grid was difficult…

See:

That said, the tech will continue improving. You might wait for a newer model (Guessing you’re using GPT-4?)…

It’ll be possible eventually.

What’s your prompt look like for it?

I put the prompt in the original message.
Maybe you did not see because it does not look like a typical prompt. Sorry for this, I am quite new in this field.

It’s better to start with “What do you see in this image, respond with a detailed answer” (Not the best prompt)

If you don’t see something close to what you are looking for after a few tries, it’s safe to assume that it cannot do it right now.

The Mantra is “Just wait” till the next model drops.

Why not use the LLM to extract the semantics of the graph, and then use an OCR to extract the graph markings and values via bounding boxes?

This could probably be done without GPT-4v, honestly.

1 Like

Yeah, but the accuracy simply isn’t there yet :confused:

Even if you’re able to extract the values somewhat adequately, they will still be far less accurate than the original measurements.

I highly recommend reaching out to the original author of the research and simply asking them for data. They are usually more than happy to help. :laughing:

1 Like

Hello. My objective is obviously not to have those data in particular but to explore a strategy to extract them from graphs. As researcher, I knwo that I can ask to the authors, but I do not want to ask for tens of authors if I want to extract data from a significant amount of articles.

Thanks anyway for your suggestion.
Regards
Guilhem

LLM and OCR is ok to extract the text information in the graph like material identification, activation energy for slopes, doping levels etc…

Failure occurs when I ask for markers detection (usually I have to ask for additional things to remove legend or other clusters detected) and even more for position calibration.

The issue with your approach (in my opinion) is that you are using a tool that doesn’t offer any ability to tune besides some prompt engineering, and it also doesn’t return the coordinates of anything like a typical OCR does.

You can use an OCR to extract the values and can even try fine-tuning it to capture the symbols. You can correlate the bounding boxes to find the values.

Then you can use simple logic to capture the symbols and correlate them on the axis for the value. You can also use an LLM to understand the semantics of the chart.

If you are willing to spend 15 days on this project and have a lot of these graphs I think it would make a lot of sense to fine-tune your own OCR and/or program some logic