Very large Neo4j graph + LLM

sai.bharadwaj.1 · February 23, 2024, 9:04am

Hi all,

I am building an application where I want to use OpenAI LLM on top of my Neo4J graph. I am using the free tier of “gpt-3.5-turbo”.
My Neo4j graph is very huge. The graph schema alone has 23800 tokens.
Now whenever I invoke the GraphCypherQAChain, I get the error message
‘message’: “This model’s maximum context length is 16385 tokens. However, your messages resulted in 23907 tokens. Please reduce the length of the messages.”

Is it not possible to build my LLM+Neo4J application using the free tier?

Can anyone help me in solving this issue?

Thank you

merefield · February 23, 2024, 9:11am

Why do you need to share the entire graph with open ai?

Shouldn’t you be performing your search “locally” and only sharing the results with the LLM?

sai.bharadwaj.1 · February 23, 2024, 9:58am

Hi,
Thank you for your response. I am trying to use the LLM to understand the User input, run the cypher query and return the response.
This is the code I am using.

from langchain.chat_models import ChatOpenAI
from langchain.chains import GraphCypherQAChain
from langchain.graphs import Neo4jGraph

llm = ChatOpenAI(
openai_api_key=OPENAI_API_KEY,
model=“gpt-3.5-turbo”,
temperature=0
)

graph = Neo4jGraph(
url=NEO4J_URL,
username=NEO4J_USER,
password=NEO4J_PASSWORD,
database=NEO4J_DATABASE
)

chain = GraphCypherQAChain.from_llm(
llm, graph=graph, verbose=True,
)

chain.run(“”“What are the cities in the graph? “””)

Can you pls guide me in solving this issue?

Diet · February 23, 2024, 12:29pm

you can use a model with a bigger context, e.g. gpt-4-turbo.

but I’d urge you to reevaluate your process. is your graph really such a kitchen sink of random entities and relations? how is it used normally? what’s the manual process here?

maybe instead of a one-shot cypher query, you may be better off using an iterative approach.

merefield · February 23, 2024, 8:04pm

I took another look at this and can only assume the results set is too large for your model.

I’d initially assumed that internally this method was sending the graph to the LLM but this explanation of this method suggests otherwise:
Functional difference between GraphCypherQAChain and GraphQAChain · Issue #9035 · langchain-ai/langchain · GitHub

Can you query neo4j directly and see how big the result is?

Can you be more specific to reduce the result size?

sai.bharadwaj.1 · February 25, 2024, 6:01pm

Hi,

Thank you for this post. I am going through that now.

I already have my Neo4j queries built.
The response time is ~5 sec. Since I hard-code the entity values and the type of relationships in my query, it runs very fast and returns the paths.

My goal is that the user can just type in a English sentence and the results must be shown. I have already built a successful POC on a small subgraph. I am also comfortable in creating prompts along with examples.

I am already telling the graph what type of relationships to use for a given input question.

It seems the problem is that the schema itself is eating up all my context length.

Thanks,

Diet · February 25, 2024, 6:27pm

have you considered embeddings or something to select the appropriate subgraph first?

cdonvd0s · February 25, 2024, 6:56pm

You can consider using grouped tables.

Group by similarity and then join only the relevant tables.

alappin · February 25, 2024, 11:46pm

You could also consider function calling and have all your data retrieval logic behind an API (function). This could make it easier to evolve and maintain your solution in the long run.

chanchalnair0531 · April 29, 2024, 5:56pm

Hello, did you found the solution to the problem you were facing about the context length. I am also working around the same thing and facing the same problem.

maruf42 · June 14, 2024, 9:50am

I’m experiencing the same problem with gpt-4-turbo.

I have over 2500 html pages which I’m converting and storing in Neo4j. When I only had 500 pages, GraphCypherQAChain was able to generate cypher query to retrieve pages from the DB. But after I ingested 600 pages, I’m no longer able to generate cypher query using the LLM.

Has anyone come up with a solution/workaround without sending the entire graph to LLM?

ladislav.urban365 · January 21, 2025, 2:29pm

We have used Neo4j + gpt-4o + LLMGraphTransformer from langchain_experimental.graph_transformers
The problem of large schema is sidestepped by finding similar or identical words that match Nodes, relations and properties of the schema. The actual call to OpenAI is to only translate the NL to Cypher using these key words. This means you do not need to send the whole schema to OpenAI at all.

Topic		Replies	Views
LangChain + OpenAI API to generate SQL queries and Result API langchain	11	15398	December 19, 2023
Help Needed: Tackling Context Length Limits in OpenAI Models Community gpt-4 , chatgpt , token , rate-limit , openai	8	13701	February 8, 2024
How can the OpenAI model's max token length error be resolved? API prompt	6	3535	July 4, 2023
Nl to sql query creation using gpt-40-mini model API gpt-4 , api	2	42	February 6, 2025
Text 2 SQL - Schema too big to fit into completion API request API	2	1235	February 6, 2025

Very large Neo4j graph + LLM

Related topics