API design doesn't respect Type-Safe Languages

tolga · July 25, 2023, 10:05am

Using API’s with type-safe programming languages is becoming increasingly difficult. As an example, here are two responses from the Chat API. In the message field, one response is a string while the other is a string array. Handling such cases in type-safe languages can be challenging. However, this issue could be easily resolved on the OpenAI side. It’s important to note that type-safe languages do exist and should be considered.

Furthermore, there is currently no documentation available for response models. This lack of documentation can lead to unexpected issues, such as finding that in production messages cannot be mapped because they can be also a string array.

{  "error": {
    "message": [
      "Invalid schema for function 'get_n_day_weather_forecast': In context=('properties', 'format'), array schema missing items"
    ],
    "type": "invalid_request_error",
    "param": null,
    "code": null
  }
}

{
  "error": {
    "message": "The model `Models.` does not exist",
    "type": "invalid_request_error",
    "param": null,
    "code": "model_not_found"
  }
}

udm17 · July 25, 2023, 10:48am

Was the request the same for both the cases ? Just curious to try and see what might be causing the variance in generation

_j · July 25, 2023, 10:56am

Tip 1: temperature=0.001

If you are just asking the chat-tuned AI to reply, you are going to just have a hard time getting it to only produce programmatic backend structures, needing robust language. Otherwise:

“Respond only with a properly structured strongly-typed json…”
“Sure! Here’s what that might look…”

(And you propose the AI natural-language output be monitored, that the language looks kind of like what a random user wants?)

Stronger and more conforming outputs will be those that are written as required function return parameters.

And still, with function parameters, you can tell the API endpoint a type integer, boolean, number, have it schema-validated (vs a rejected “float”) - and you still get an AI-generated float if that’s what you asked for in the description (except not encapsulated in quotes as if you specified “string”).

With functions, you can also return a descriptive error and AI will try to rewrite and do again.

So it’s all about language. Powerful robust language instructions resistant to the distraction of large context data.

tolga · July 31, 2023, 4:47pm

@_j It would be helpful to mention that this comment was generated by AI. I had to spend a considerable amount of time trying to understand if you were actually trying to convey a message, which was quite annoying.

tolga · July 31, 2023, 4:48pm

@udm17 The endpoint is the same, but the request body was different.

_j · July 31, 2023, 4:53pm

My comment was not generated by AI. It was ticky-tacky typed in by me. I’m sorry if my robust lexicon managed to baffle you into disbelief with its impenetrable preciseness and accuracy, above,a staccato point-by-point recitation of the steps to consider in improving generated content conformant to specifications.

jwatte · July 31, 2023, 5:12pm

What’s hard about this in a strongly typed language?

data StringOrArray = String String | Array [String] deriving (Generic, Show)

instance FromJSON StringOrArray where
  parseJSON (Array a) = ...
  parseJSON (String s) = ...

tolga · July 31, 2023, 5:14pm

@_j I apologize. It was entirely my fault. Upon rereading your message, I believe I understand what caused the misunderstanding. The responses I shared were from the OpenAI API’s response, not the AI generated responses. It appears that you mistook them for a response generated by GPT-3 (function feature).

tolga · July 31, 2023, 5:25pm

@jwatte, I don’t recall encountering a type named “StringOrArray” in C, C++, or C#. Additionally, even the naming convention of “String OR Array” implies to me that this is not type-safe.

jwatte · July 31, 2023, 5:37pm

You are conflating “type safe,” which means “for all possible input data, there is a well defined behavior,” with some particular type schema, such as “for all possible JSON payloads, the value of each attribute has only one possible type.”

I agree that sum types are uncomfortable to express in C++, which is why C++ is only a moderately strongly typed language. (That, and the fact that the type system is generally unsound, as well as easy to defeat.)
You end up with something like:

typedef std::variant<std::string, std::vector<JsonValue>> StringOrArray

Which is fairly unerconomic, but is how C++ has chosen to express this kind of type definition.

I agree that a more thoroughly documented schema for each of the response types would be super handy!

Topic		Replies	Views
Using API with Type Safe languages API	0	375	March 17, 2023
Function Calling parameter types Prompting gpt-4	9	46885	December 13, 2023
Response has valid json but it's nested in broken json Bugs	16	3336	September 9, 2024
Please remove mixed types from API Feedback	0	135	May 16, 2024
Type: json_object shortens response API	2	125	September 23, 2024

API design doesn't respect Type-Safe Languages

Related topics