Query string improperly formatted for API REST call to wikidata

I am building a Wikidata helper by using an action. When I say “list all cats”, this is how the GPT formulates the REST HTTP endpoint call:

[debug] Calling HTTP endpoint
{
“domain”: “query.wikidata.org”,
“method”: “get”,
“path”: “/query”,
“operation”: “GetQueryResult”,
“operation_hash”: “0377bd9dafa61ec388091f01fba3b83fdb5cdfc9”,
“is_consequential”: false,
“params”: {
“query”: “SELECT ?cat ?catLabel WHERE {\n ?cat wdt:P31 wd:Q146 .\n SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }\n}”
}
}

The query is not formulated correctly. If you do a select on the SELECT string by first removing \n and then removing the \ in front of ", this SELECT string works in:

https://query.wikidata.org/

Here is the corrected SPARQL string that I’d like to be the query value:

SELECT ?cat ?catLabel WHERE {?cat wdt:P31 wd:Q146 . SERVICE wikibase:label { bd:serviceParam wikibase:language “[AUTO_LANGUAGE],en”. }}

I removed the \n and the \ in front of ". If you copy and paste this into the query service URL, it works correctly.

I tried multiple ways of getting the GPT to fix this string in Instructions, but failed.

Any ideas on how to proceed? The good news is that the Wikidata helper does a great job at formulating the SPARQL text as long as I do not create an action. But an action would allow us to see the results of the query and not just the SPARQL query.

Have you tried instructing the model to not include newline characters in the sparql string?

Yes I tried that but it ignored the instruction. Perhaps I can rephrase but so far no luck

The two options I see are this,

  1. Continue revising the instructions. The model is generating some formatted text for the SPARQL, most likely because that’s how it appears in most of the training data it has been exposed to. One possible way to rectify this is by including a one-shot to few-shot example of the construction of a properly formatted query in your instructions with a note that the query will be sent as a parameter in a REST API call.
  2. Create a workaround. On a computer you control (e.g. not query.wikidata.org) set up essentially an API proxy relay. The computer you control accepts the query however it is created by the model, rectifies it into something which will be accepted by the endpoint, makes the request, receives the response and forwards it back to the model. It’s an extra hop in the chain but shouldn’t add too much latency overall, plus it gives you a lot more control and let’s you do things like cache API calls, etc if the response is unlikely to change in realtime.

The key thing you need to understand is that the models are stochastic, you cannot guarantee they will always correctly follow specific instructions, so it is very useful (and often necessary) to construct backstops to ensure the models cannot screw up too badly.

In time, eventually, that may change and they may get to the point where we can absolutely rely on their unfailing adherence to our instructions, but that’s not today.

Really good detailed comments. I am continuing to do #1. I tried one shot, but maybe more than one is a good approach. I am getting GPT-4 usage cap messages, so have to wait a couple of hours to continue testing.

#2 is curious and I follow your logic, but unsure how to go about this proxy relay.

We need more guidance from OpenAI on Actions. Or maybe I have not read enough of what they have in the way of documentation.