Very large Neo4j graph + LLM

Hi all,

I am building an application where I want to use OpenAI LLM on top of my Neo4J graph. I am using the free tier of “gpt-3.5-turbo”.
My Neo4j graph is very huge. The graph schema alone has 23800 tokens.
Now whenever I invoke the GraphCypherQAChain, I get the error message
‘message’: “This model’s maximum context length is 16385 tokens. However, your messages resulted in 23907 tokens. Please reduce the length of the messages.”

Is it not possible to build my LLM+Neo4J application using the free tier?

Can anyone help me in solving this issue?

Thank you

Why do you need to share the entire graph with open ai?

Shouldn’t you be performing your search “locally” and only sharing the results with the LLM?

Hi,
Thank you for your response. I am trying to use the LLM to understand the User input, run the cypher query and return the response.
This is the code I am using.

from langchain.chat_models import ChatOpenAI
from langchain.chains import GraphCypherQAChain
from langchain.graphs import Neo4jGraph

llm = ChatOpenAI(
openai_api_key=OPENAI_API_KEY,
model=“gpt-3.5-turbo”,
temperature=0
)

graph = Neo4jGraph(
url=NEO4J_URL,
username=NEO4J_USER,
password=NEO4J_PASSWORD,
database=NEO4J_DATABASE
)

chain = GraphCypherQAChain.from_llm(
llm, graph=graph, verbose=True,
)

chain.run(“”“What are the cities in the graph? “””)

Can you pls guide me in solving this issue?

1 Like

you can use a model with a bigger context, e.g. gpt-4-turbo.

but I’d urge you to reevaluate your process. is your graph really such a kitchen sink of random entities and relations? how is it used normally? what’s the manual process here?

maybe instead of a one-shot cypher query, you may be better off using an iterative approach.

1 Like

I took another look at this and can only assume the results set is too large for your model.

I’d initially assumed that internally this method was sending the graph to the LLM but this explanation of this method suggests otherwise:
Functional difference between GraphCypherQAChain and GraphQAChain · Issue #9035 · langchain-ai/langchain · GitHub

Can you query neo4j directly and see how big the result is?

Can you be more specific to reduce the result size?

Hi,

Thank you for this post. I am going through that now.

I already have my Neo4j queries built.
The response time is ~5 sec. Since I hard-code the entity values and the type of relationships in my query, it runs very fast and returns the paths.

My goal is that the user can just type in a English sentence and the results must be shown. I have already built a successful POC on a small subgraph. I am also comfortable in creating prompts along with examples.

I am already telling the graph what type of relationships to use for a given input question.

It seems the problem is that the schema itself is eating up all my context length.

Thanks,

1 Like

have you considered embeddings or something to select the appropriate subgraph first?

You can consider using grouped tables.

Group by similarity and then join only the relevant tables.

You could also consider function calling and have all your data retrieval logic behind an API (function). This could make it easier to evolve and maintain your solution in the long run.

1 Like

Hello, did you found the solution to the problem you were facing about the context length. I am also working around the same thing and facing the same problem.

I’m experiencing the same problem with gpt-4-turbo.

I have over 2500 html pages which I’m converting and storing in Neo4j. When I only had 500 pages, GraphCypherQAChain was able to generate cypher query to retrieve pages from the DB. But after I ingested 600 pages, I’m no longer able to generate cypher query using the LLM.

Has anyone come up with a solution/workaround without sending the entire graph to LLM?