I am new to chatgpt but I found it very nice to extract some information from images. Then I decided that it should be possible to extract data from a graph. From what I saw, everything is here to make it possible but the system always fails at a given moment (never the same moment) and never finishes the action.
Here is the prompt I submitted to extract the information.
“My objective is to extract the data from a graph in form of an image.Here is an image representing a graph with two curves. one curve is made of blue triangles markers connected with a blue dash line, the second one is a curve made of red circles markers connected with a red dash line. These curves represent the log(sigmaT) where sigma is the conductivity and T the temperature in kelvin, as a function of 1000/ T. First, can you identify the two compounds that correspond to these two curves. Their composition is indicated in the title of the image and in the legend with different x values for barium content. Then make me validate before going on. Second, can you identify the blue triangles markers and red circles markers in the image. Each triangle or circle should contain several hundreds of pixels. So use this information to remove small clusters. Remove also the markers that contain empty regions or that have a shape of a letter or a number. Then show me on the image the centers of those clusters consisting of triangle and circles. make me validate before going on. Finally, extract the values associated with the blue triangles markers and the red circles markers and put them in a table where the first column is the x position in th 1000/T scale, where the y position is the log(sigmaT) value. Then make a graph with the extracted value similar to the original one. put the graph close to each other to allow for a comparison.”
It detects quite correctly that there are two curves, one made of blue triangles and one made of red circles and what they correspond to, but it never succeed in performing the process the good way, it fails very often. I have spent 15 days on this without getting my final objective that is simply to recover the data of the curves. The point also is that the extraction procedure seems to vary from an attempt to another, and chatgpt misses important information in the prompt.
I put the prompt in the original message.
Maybe you did not see because it does not look like a typical prompt. Sorry for this, I am quite new in this field.
Hello. My objective is obviously not to have those data in particular but to explore a strategy to extract them from graphs. As researcher, I knwo that I can ask to the authors, but I do not want to ask for tens of authors if I want to extract data from a significant amount of articles.
Thanks anyway for your suggestion.
Regards
Guilhem
LLM and OCR is ok to extract the text information in the graph like material identification, activation energy for slopes, doping levels etc…
Failure occurs when I ask for markers detection (usually I have to ask for additional things to remove legend or other clusters detected) and even more for position calibration.
The issue with your approach (in my opinion) is that you are using a tool that doesn’t offer any ability to tune besides some prompt engineering, and it also doesn’t return the coordinates of anything like a typical OCR does.
You can use an OCR to extract the values and can even try fine-tuning it to capture the symbols. You can correlate the bounding boxes to find the values.
Then you can use simple logic to capture the symbols and correlate them on the axis for the value. You can also use an LLM to understand the semantics of the chart.
If you are willing to spend 15 days on this project and have a lot of these graphs I think it would make a lot of sense to fine-tune your own OCR and/or program some logic