Sending Multiple Messages in a Single API Call for Cost Optimization and Structured Output

Hello OpenAI Community,

I’m working on a project where I need to process multiple text messages, extracting structured data from each one (e.g., origin city, destination city, freight type, etc.). Currently, I’m making one OpenAI API call per message to get the structured output.

To optimize costs, I’d like to bundle multiple messages (within the token limit) into a single API call and receive a structured response for each message. However, I’m unsure how to best structure the input JSON and process the output in a way that keeps the responses matched to the corresponding input messages.

Here’s what I’m trying to achieve:

  • Input: A JSON file containing multiple messages (up to the token limit) sent in a single API request.
  • Desired Output: A structured JSON output where each message’s extracted information (e.g., origin city, destination city, freight type, etc.) is separately identifiable.

Has anyone worked on a similar use case, and can you share any guidance on how to structure the input and handle the response efficiently? Any advice on managing large files within token limits while maintaining structured output would be greatly appreciated.

Thank you in advance for your help!

2 Likes

Hi @hamraev and welcome to the community!

Yes, this is possible. At some point, i.e. at certain number of messages in a single call, you may start to experience hallucinations, but the only way to know for sure is to try and test it!

So in your system prompt you would simply state something like:

You are an expert at parsing freight related messages. Your job is to extract and return structured information, for each message, as per the enclosed schema.

In the user prompt you would simply list messages, clearly separating them, i.e.

Message ID 1
<INSERT_CONTENTS_HERE>
Message ID 2
<INSERT_CONTENTS_HERE>

Message ID N
<INSERT_CONTENTS_HERE>

And then you define a JSON schema and include it in your call as per structured outputs guide.

The schema (if using Python / Pydantic) would be something like:

class MessageInfo(BaseModel):
    timestamp: str
    origin_city: str
    destination_city: str
    freight_type: str

class Messages(BaseModel):
    messages: List[MessageInfo]

You supply Messages in your response_format.

One tip for reducing hallucinations is for certain discrete items, like freight_type to use enum.

3 Likes

Thank you so much for your response, @platypus ! I applied your suggestions, and it worked perfectly. I was able to send multiple messages in a single API call and retrieve structured data just as I needed—it’s been a huge help for optimizing costs!

I do have a follow-up question: sometimes a single message contains more than one freight route, for example, listing different origin and destination cities or multiple freight types in the same message. In these cases, I’d like to extract each route separately. How would you suggest handling this scenario to ensure that I can still receive structured output for each individual route, even when multiple routes are present within a single message?

Here’s a sample message for context:

{
  "messages": [
    {
      "id": 1,
      "timestamp": "2024-10-01T09:00:00",
      "content": "We have a shipment from Berlin to Paris, 20 tons of steel."
    },
    {
      "id": 2,
      "timestamp": "2024-10-01T10:00:00",
      "content": "We have a shipment from New York to Los Angeles, 10 tons of electronics. Additionally, there's a load going from Miami to Houston, 5 tons of textiles. We're also looking to ship from Chicago to San Francisco with 8 tons of machinery."
    }
  ]
}

And I would like to have an expected output similar like to this:

{
  "messages": [
    {
      "id": 1,
      "timestamp": "2024-10-01T09:00:00",
      "routes": [
        {
          "origin_city": "Berlin",
          "destination_city": "Paris",
          "freight_type": "steel",
          "weight": "20 tons"
        }
      ]
    },
    {
      "id": 2,
      "timestamp": "2024-10-01T10:00:00",
      "routes": [
        {
          "origin_city": "New York",
          "destination_city": "Los Angeles",
          "freight_type": "electronics",
          "weight": "10 tons"
        },
        {
          "origin_city": "Miami",
          "destination_city": "Houston",
          "freight_type": "textiles",
          "weight": "5 tons"
        },
        {
          "origin_city": "Chicago",
          "destination_city": "San Francisco",
          "freight_type": "machinery",
          "weight": "8 tons"
        }
      ]
    }
  ]
}

1 Like

Glad it works!

It should be no problem to handle multiple routes. So I would just update the Pydantic model as follows:

class Route(BaseModel):
    origin_city: str = Field(description="Originating city/location")
    destination_city: str = Field(description="Destination city/location")
    freight_type: str = Field(description="Type of goods transported, e.g. textiles, machinery, etc")
    weight: str = Field(description="Weight of goods, e.g. 8 tons, 1200 lbs, etc")

class Message(BaseModel):
    timestamp: str = Field(description="Timestamp in ISO 8601 format")
    routes: List[Route]

class Messages(BaseModel):
    messages: List[Message]
2 Likes

Hi Would you mind sharing how you could generate the multiple outputs? I make the messages a list but it will only generate single output. Thanks!

You have two options. If immediate results are necessary, you should implement asynchronous calls to the API. If not, Batch API.

Thanks for the reply. But I think this thread successfully sent multiple messages in a single API call and got separate responses?

What was happening above: OP concatenated the messages into a single prompt and then used SO to return an array objects in a single inference. This will work for some jobs, but not all. For example, you probably wouldn’t want to concatenate multiple different legal docs into a single prompt. In a case like this you would need to limit the context window to contain the text of a single document and perform a single data extraction. To accomplish this in a single API call, you would need to create a JSONL file with the jobs and submit to the batch API, which could take up to 24 hours to turn around.