Fine‑tuning GPT‑4.1 for Text→Cypher on a Neo4j Schema

Arsh_100 · May 9, 2025, 5:48am

Summary

I have fine‑tuned the new GPT‑4.1 model to translate natural‑language questions into Cypher queries for a two‑node Neo4j schema. My dataset (98 examples) interleaves schema descriptions for each node with NL→Cypher pairs. I observed a final training loss of ~1.33, and I am seeking guidance on dataset size, schema‑instruction design, and best practices for uploading to and operating within the OpenAI platform using GPT‑4.1.

Use case

Example: “Find all restaurants in Dubai with rating > 4.”
System prompt enforces “Use only the provided properties; do not invent relationships or fields.”

Dataset format:
{“messages”: [{“role”: “system”, “content”: "Task: Generate Cypher queries to query a Neo4j graph database based on the schema and examples you have been finetuned on.\n Instructions:\n Use only the provided properties.\n Do not use any other relationship types or properties that are not provided.\n If you cannot generate a Cypher statement based on the provided schema, explain the reason to the user.\n "}, {“role”: “user”, “content”: “Which restaurants had more than 100 orders in ‘Apr 2025’?”}, {“role”: “assistant”, “content”: “MATCH (n:AccountManagerReport)\nWHERE n.am_month = ‘Apr 2025’ AND toFloat(n.order_nos_curr) > 100\nRETURN n.name AS restaurant, toFloat(n.order_nos_curr) AS order_count\n”}]}

Fine‑tuning Configuration
{
“model”: “gpt-4.1”,
“training_file”: “file-XXXXXXXX”,
“n_epochs”: 10,
“batch_size”: 8,
“learning_rate_multiplier”: 0.1
}

The first few 4 instances contain properties and description of each node and rest all are examples as given above. Could you guys give any suugestions or experience you have gone through.

OnceAndTwice · May 9, 2025, 9:57pm

Is your concern just that the finetuned model is underperforming? How does its accuracy compare to base models?

Something important here is that models aren’t “aware” of the information they’re fine-tuned on. They don’t read / recall it the same way as your input messages. So - if things like n.am_month are specific to your database, then you’ll almost certainly get better results placing this kind of information in your system prompt instead of attempting to finetune it into the model.

If base models are failing to write these queries with perfectly good context, only then would I personally consider any finetuning. Good luck!

Arsh_100 · May 12, 2025, 5:09am

Thank you so much for taking time to reply!
Actually you are right, I am now first trying with system prompt and it is been working really well. My company and I were trying to look for something which can be our own ownership and thought of trying to finetune to see how it would work but the results have not been good so far honestly.

Topic		Replies	Views
Seeking Advice on Fine-Tuning NLP Model for Response Generation Community gpt-35-turbo	3	97	February 24, 2025
Struggling with fine-tuning GPT for generating JSON API fine-tuning , fine-tuning-problems	1	347	July 9, 2024
How to fine tune text to sql? API	19	9041	April 26, 2024
I feel like my tuning model isn't "learning" API fine-tuning	3	523	September 29, 2023
Fine Tuned Chatbot forgets how to output summary of conversation API	9	1824	December 18, 2023

Fine‑tuning GPT‑4.1 for Text→Cypher on a Neo4j Schema

Related topics