I’m currently testing multiple LLMs via API for their prompt robustness. I’m paraphrasing a very simple math question, setting the max token to 1 or 2, then storing answers from GPT, Claude, etc. When I used GPT-3.5-0125, I only got answers in string format, but when I switched to GPT-4-Turbo-2024-04-09, I suddenly received answers in integer format. I didn’t know that the GPT API could also provide answers in various types besides string and JSON. Is it normal to receive integers in the response?
It is not normal. The API response is in JSON.
To get the very basic form, I send using the Python requests library:
req = requests.post("https://api.openai.com/v1/chat/completions",
headers=headers,
json=json)
and receive back:
>>>req.content
b'{\n "id": "chatcmpl-9DYcFTip6tFaFWxxxxx",\n "object": "chat.completion",\n "created": 1713018399,\n "model": "gpt-4-turbo-2024-04-09",\n "choices": [\n {\n "index": 0,\n "message": {\n "role": "assistant",\n "content": "0"\n },\n "logprobs": null,\n "finish_reason": "stop"\n }\n ],\n "usage": {\n "prompt_tokens": 29,\n "completion_tokens": 1,\n "total_tokens": 30\n },\n "system_fingerprint": "fp_76f018034d"\n}\n'
In bytes format, you see every single character received.
There should be no way that a single token 0 that I requested (and unreceived stop token) can be other than "content": "0"
– a string, in quotes. It would take some new interpretation method of AI output by the endpoint code (and particular to other than the 0 or 999 I just tested).
I suggest that you log the raw API response to see if it is instead your code that is being too “smart”.
Thank you for reassuring me! I just checked, and the answer comes in as a string. Now I have to figure out which part I’ve done wrong!
This was the problem: I used df['answer'] = df['answer'].replace(' ', np.nan)
to eliminate any spaces, but it turns out that pandas automatically converts the entire column to integers if the column appears to be filled only with integer-like strings. I had to add .astype(str)
at the end to prevent this. The issue wasn’t really with OpenAI.