I’ve been getting some bad responses when getting results back for function calls. This takes a few forms. The one I can’t work around without severe ugliness is this new one, where the arguments parameter comes back as invalid JSON. (With backticks surrounding its body, instead of double quotes, which breaks JSON parsing. It seems to happen when the string the LLM is quoting contains double quotes, so I assume it’s the LLM trying to save escaping by using a different external quote type. Here’s an example:
(Resulting in: Failed to decode arguments: Expecting value: line 3 column 13 (char 41) And it’s the ‘source’ argument in the ‘arguments’ object that’s causing the problem in this particular case.)
I’ve also had the LLM ignore ‘required’ fields, and it also seems to get creative when I have boolean parameters. (It was sometimes passing the string ‘fail’ even when I explicitly asked it for True and False. And of course since the field was typed as a boolean, I really shouldn’t have had to tell it in the first place.)
Is there some documentation on all this that I’m missing? Is this a known issue? Is there a standard work-around, that doesn’t involve trying out different string quoting approaches?
{
"filename": "src/hi.ts",
"source": `
// Import the necessary dependencies
import { greet } from "./hello";
// Call the greeting function
greet();
`
}
Is that anything like code your function could execute?
It seems that the AI decide to emit a backtick, enclosing some stuff in a preformatted text block if it was markdown.
You should run function-calling AI at low top_p, like 0.2, so it has no chance of randomly choosing low-probability output tokens when it is constructing the text of a function call. And then increase the quality of that generation with better instructions and function descriptions that give the AI certainty.
That is indeed the expect content for the source attribute, but more importantly I was expecting valid JSON back, which this response does not contain. I suppose I had different expectations around function calling: that the API would enforce the contract in the function description. This seems to not be the case at all, but the LLM is being asked to make a best effort at following the constraints provided. Even with a low top_p or low temperature, it sounds like this will all arbitrarily fail sometimes. Means calling has to be quite defensive.
Also just re-tested with a top_p of 0.2 with no change in output, fwiw. Even with a temperature of 0.1 and a top_p of 0.1, or a temp and top_p of 0, the LLM keeps sending me back a function with invalid quotation. Thanks for the suggestion on where to try to improve this output, though.
Then there is something lacking in your function description and property name and description.
You need to tell the AI it must produce executable code within the output string, output which will then be fed directly into a sandbox code environment and run, and runtime errors are intolerable. Explain exactly the environment and libraries installed on the execution environment.
then, from advanced bot engineering…
Another technique is to provide a dummy function, like in this case, a python function that says "python is disabled, you must write javascript.
Then in the system prompt you must inject and emulate how functions are used:
You are a system bot. I will write more system instructions later
# tools
## functions
You have a function javascript, where javascript will be run directly in the runtime environment of the user interface and output displayed to the user, which can be used for any case where the user either needs code generated or needs to receive a calculated output. There is no json container or other formatting, you will output only executable js code as an API tool call.
The issue is not it not producing code correctly, the issue is it sending invalid JSON in the arguments object. JSON doesn’t allow quoting with backticks. I’m not executing the code generated in this block… That’s not the issue. (And the typescript the LLM generated is perfectly fine, that’s also not the issue.) The issue is the ‘JSON’ produced is not JSON, and the API (at least as far as I can tell) is expected to return a valid JSON object.
Something made it produce that first backtick instead of a double-quote, and it might be out of your control.
You could do some “string cleaner-upper” that sees something containerized in backticks (or just left open), replaces them, and then looks for unescaped quotes within. Same with dangling braces. Then some more algorithm techniques until json.loads() doesn’t throw an exception, else return “invalid JSON, dumb bot, try again”.
The AI is generally confused though, not knowing if it is making python or json or typescript. Boolean is a roll of the dice whether it is capitalized or not.
The AI is uncertain of the type of characters to generate when emitting JSON container data to an API function. A high temperature or top_p highlights this uncertainty with data structure errors.
Despite the amount of function fine-tune AI was given (vs the general corpus of code knowledge the AI was pretrained on) it still makes these errors.
JSON also looks like python data structures. JSON, for example, needs lowercase boolean, Python, uppercase.
There’s a difference between ‘escaping backticks in a quoted string’ and ‘quoting with backticks.’ One is valid JSON, and the other is not. The example above is an example of invalid JSON, being emitted by the LLM, where the LLM is composing a response to a function call request.
As I mentioned earlier, I have also tried this with both temperature 0 and 0.2, and top_p of 0 and 0.2.
As to booleans, I didn’t show the example, but when passing a function with a declared boolean type, I would sometimes get True, sometimes true. That’s fine. I don’t mind doing a case-insensitive comparison to get truthiness. But I’d also sometimes get PASS and sometimes pass in addition to true and True and TRUE. I mistakenly took the declaration of the field types as a contract. (And also expected that when the API says you’ll get back valid JSON, that you will.)
This seems to have really bothered people that I’m asking about this. If the API doesn’t reliably return JSON for function calls, that’s complicating. I was trying to ask if that’s expected behavior. It sounds like instead to get rather bent out of shape. And I refer Regular to your own post, in which you cite that a string in JSON must be "wrapped with quotation marks (U+0022). In the examle I posted, one of the 4 JSON strings used as parameter values was ‘wrapped with backtick characters’. It really sounds like you’re asserting that the JSON I posted is totally valid. You can take it into any JSON parser you like, and I think you’ll find that they all say something like, “this is invalid because I expected a quote but got a backtick.”
If I have to prompt ChatGPT to follow its own API rules, so be it. It seems that I do. Or I have to just decide to look for invalid JSON and try to wrangle it into valid JSON.
I’m good with people posting examples of valid JSON., and appreciate people trying to help. I even know what that looks like! The point here is, the JSON being returned IN A VERY NARROW CASE defined in the API docs (as the LLM-generated response to a function call) is malformed. Unfortunately the LLM doesn’t seem to reliably return valid JSON in this case.
Most function calls work just fine, though types aren’t reliably matched and ‘required’ arguments are more like ‘suggested’. I didn’t expect that I’d have to prompt engineer to get GPT to respond to a published API spec, but if I have to, ok! It’s early days.
My explaining the symptoms you experience to someone else who is making wrong conclusions isn’t getting bent out of shape.
If anything to get bent of shape about, it is that OpenAI clearly is treating gpt-3.5-turbo as their own playground to see how much they can degrade the model by sparsity, ablation, quantization, filling it up with denials, and making it ignore instructions and programming beyond “you are ChatGPT” while generating the minimum passable text. Shameful that there is no stable model with any promise that your application will still work tomorrow or like function-call models could still perform tasks six weeks ago.
Here’s some Python code I’m using to work around this, in case it helps others:
import json
import re
def process_nominal_json_string(nominal_json_string):
try:
return json.loads(nominal_json_string)
except json.JSONDecodeError:
# Define a regular expression pattern to match "key": `value` pairs
pattern = r'"([^"]+)": `((?:[^`\\]|\\.)*)`'
print("Correcting invalid JSON string...")
def replace_match(match):
key = match.group(1)
value = match.group(2).replace('"', r'\"').replace('\\`', r'`').replace('\n', r'\n') # Replace " with \" and \` with `
return f'"{key}": "{value}"'
# Use re.sub to replace all matches in the input string
replaced_string = re.sub(pattern, replace_match, nominal_json_string)
try:
# Attempt to parse the modified string as JSON
parsed_json = json.loads(replaced_string)
return parsed_json
except json.JSONDecodeError:
raise ValueError(f"Input string was invalid JSON, and we weren't able to fix it. We tried converting:\n\t{nominal_json_string}\n\n...to...\n\t{replaced_string}")
Someone proposed a very simple solution to openAI to enforce json schema - but got way to little attention.
OpenAI should really pay attention to this.
Since I can’t post a link, here a brief summary : as the tokens probs are being calculated - only evalute those who fit a provided json schema.
It’s that simple.
I was on the same page as you from the start of this thread as I have the same issue. Thanks for sticking it out through the frustrating thread and sharing this workaround, i’ve now implemented it in my own code.