Factual answers from graph data

brkrabac · February 11, 2022, 10:57pm

Has anyone explored the area of getting answers for questions from graph data?

TLDR, main challenges I’m working through:

How to consistently follow the edges for answers that require info from more than a single node
How to determine what subset of the graph is needed if the approach for getting the answer needs to fit in a 2k token limit
Can this be done without a fine-tuned model or do we need to go down that route?

I’ve found that GPT-3 does a reasonably good job of taking a JSON list of nodes/edges and answering questions as a completion prompt if you provide example data+questions+answers and then pass the actual data+question, but doesn’t always follow the correct edges when forming answers. I’m guessing if I could provide more examples of different cases, I could get it there, but I quickly run out of room within the token window to do so.

Is this a case where a fine-tuned model is need or are there other ideas to help reinforce understanding the relationships?

The other challenge is how get the right data from a graph that is too large to fit in a 2k token window. I built a fun little set of pre-processing steps to search nodes to find ones relevant to the query and to filter the graph by node types mapped to a classifier and then adding in the appropriate edges. Does a reasonable job pairing down the graph to a fragment that better fits within the limit, but curious if others have better ideas for approaching this as well?

I’ve also tried converting the graph edges/nodes to NL and then asking questions directly against that output. This approach seems to show some promise as well and will continue to explore this. Some similar challenges around making sure the relationships are respected, but it seems like I could also do this in smaller batches (source node + edge + target node as a “document”) and run it all through the answers endpoint?

vaibhav.garg · April 27, 2022, 4:24am

Fascinating idea.
We worked on a similar paradigm in our application BookMapp. but we are constructing the graph of relatedness, whereas you are approaching this from the other direction.

Is the graph more or less balanced in general? If it is, it makes sense to have a notion of level 1 nodes (starting from top), level 2 nodes and so forth. Assuming that can be done, would it be a possibility to perform repeated classifications at a given node level?

Example:

whether something is a person or a non-person at top level.
if it is an non-person, is it living or non-living.
if living, is it a plant or an animal.
if animal, is it a mammal, amphibian, reptile and so forth.

I would be very excited and curious to know what comes out of your exploration.

saschamcdonald · February 19, 2023, 5:12pm

Hey This is really interesting. My start-up is working with Gov data in the UK. If you fancy a chat about this let me know.

Topic		Replies	Views
Entity and relation extraction fine-tuning API	4	5477	May 4, 2024
Help with project approach Community gpt-4 , chatgpt , api	4	397	March 13, 2024
Has anyone successfully used OpenAI to interpret data sets? Prompting	7	2106	December 18, 2023
How Can Knowledge Graphs Be Used with GPT-4 to Reduce Token Usage in Prompts? Prompting gpt-4	2	649	June 12, 2024
How I cluster/segment my text after embeddings process for easy understanding? API	13	13645	December 18, 2024

Factual answers from graph data

Related topics