Wrong iso timestamps from tool-call in generated answers

timoe · October 18, 2024, 10:04am

Hello,

I have the following problem when prompting my gpt4o model.

Lets say I have the following Tool-Call definded getData().

Then I ask my model “Can you show me the latest data?”

The correct data is gather one object looks like in the raw version:
{
name: Test1,
timestamp: 2024-06-08T14:03:34.132Z
}

But the generated answer always messes up the timestamp!
The diffrent incorrect styles of the timestamp depend on my system-message. With the following scenarios:

System-message:
The current system time in timezone UTC is {time}. The format of every input and output timestamps is in the ISO format like YYYY:MM:DDTHH:MM:SS.SSSZ.
Here are rules in which way you must respond and act:

In your answers use the resulting timestamps from the tool-call responses.

Respsonse: The timestamp is alwys the current day varying in hours, minutes or seconds

System-message:
The format of every input and output timestamps is in the ISO format like YYYY:MM:DDTHH:MM:SS.SSSZ.

Response: The timestamp is the first day of the year like 2024-01-01T00:00:00.000Z
System-message:
The format of every input and output timestamps is in the ISO format like YYYY:MM:DDTHH:MM:SS.SSSZ.
Here are rules in which way you must respond and act:

In your answers use the resulting timestamps from the tool-call responses.

Response: The year, month and day are the correct original ones but the time is totaly wrong. Or the model response with a unix timestamp.

This drives me crazy. How can I achieve that the correct timestamps from the tool-call response are in the generated answer?

Best regards
Till

rcasburn · October 18, 2024, 1:05pm

is it actually calling the tool? Maybe make your system message:

You are a helpful assistant who can provide the user with recent data. If the user asks for the most recent data, call use the getData tool and provide the data back to the user including the timestamp.

timoe · October 18, 2024, 1:41pm

The tool is 100% called. I can confirm that.

The problem is my system message and provided tools are a bit more complex but I can not share them completly due to compliance reasons. This makes the diagnose difficult. Let me anonymse it and share the whole one.

_j · October 18, 2024, 1:42pm

You don’t need to talk about functions in the system message.

What you could simply put in the system message is the current run time.

A tool name and description field should be all that is required.

Sample tool return data

[
  {"name": "ai_system_crash_000", "timestamp": "2024-10-01T00:34:22Z"},
  {"name": "ai_system_crash_001", "timestamp": "2024-10-01T02:48:15Z"},
  {"name": "ai_system_crash_002", "timestamp": "2024-10-01T06:07:03Z"},
  {"name": "ai_system_crash_003", "timestamp": "2024-10-01T09:32:58Z"},
  {"name": "ai_system_crash_004", "timestamp": "2024-10-02T12:10:19Z"},
  {"name": "ai_system_crash_005", "timestamp": "2024-10-03T03:25:07Z"},
  {"name": "ai_system_crash_006", "timestamp": "2024-10-03T14:58:47Z"},
  {"name": "ai_system_crash_007", "timestamp": "2024-10-04T05:43:22Z"},
  {"name": "ai_system_crash_008", "timestamp": "2024-10-05T11:52:35Z"},
  {"name": "ai_system_crash_009", "timestamp": "2024-10-05T16:04:12Z"},
  {"name": "ai_system_crash_010", "timestamp": "2024-10-06T18:46:59Z"},
  {"name": "ai_system_crash_011", "timestamp": "2024-10-07T18:33:04Z"},
  {"name": "ai_system_crash_012", "timestamp": "2024-10-08T03:39:27Z"},
  {"name": "ai_system_crash_013", "timestamp": "2024-10-08T10:20:58Z"},
  {"name": "ai_system_crash_014", "timestamp": "2024-10-09T07:45:06Z"},
  {"name": "ai_system_crash_015", "timestamp": "2024-10-09T22:27:44Z"},
  {"name": "ai_system_crash_016", "timestamp": "2024-10-10T15:11:31Z"},
  {"name": "ai_system_crash_017", "timestamp": "2024-10-11T06:02:55Z"},
  {"name": "ai_system_crash_018", "timestamp": "2024-10-11T12:56:21Z"},
  {"name": "ai_system_crash_019", "timestamp": "2024-10-12T21:15:38Z"},
  {"name": "ai_system_crash_020", "timestamp": "2024-10-13T03:07:43Z"},
  {"name": "ai_system_crash_021", "timestamp": "2024-10-13T10:45:10Z"},
  {"name": "ai_system_crash_022", "timestamp": "2024-10-14T06:58:12Z"},
  {"name": "ai_system_crash_023", "timestamp": "2024-10-14T16:02:19Z"},
  {"name": "ai_system_crash_024", "timestamp": "2024-10-15T09:29:37Z"},
  {"name": "ai_system_crash_025", "timestamp": "2024-10-15T17:45:08Z"},
  {"name": "ai_system_crash_026", "timestamp": "2024-10-16T20:37:15Z"},
  {"name": "ai_system_crash_027", "timestamp": "2024-10-17T09:01:23Z"},
  {"name": "ai_system_crash_028", "timestamp": "2024-10-17T18:39:04Z"},
  {"name": "ai_system_crash_029", "timestamp": "2024-10-17T23:55:30Z"},
  {"name": "ai_system_crash_030", "timestamp": "2024-10-18T01:23:17Z"},
  {"name": "ai_system_crash_031", "timestamp": "2024-10-18T04:28:42Z"},
  {"name": "ai_system_crash_032", "timestamp": "2024-10-18T10:56:09Z"},
  {"name": "ai_system_crash_033", "timestamp": "2024-10-18T13:18:33Z"},
  {"name": "ai_system_crash_034", "timestamp": "2024-10-18T16:52:26Z"},
  {"name": "ai_system_crash_035", "timestamp": "2024-10-18T18:25:09Z"},
  {"name": "ai_system_crash_036", "timestamp": "2024-10-18T20:03:14Z"},
  {"name": "ai_system_crash_037", "timestamp": "2024-10-18T21:47:21Z"},
  {"name": "ai_system_crash_038", "timestamp": "2024-10-18T22:38:08Z"},
  {"name": "ai_system_crash_039", "timestamp": "2024-10-18T23:48:52Z"}
]

Model quality in discussion of data:

gpt-3.5-turbo-1106

gpt-4o-2024-08-06

I have to suspect that you are not reducing top_p to something like 0.5 or below for reliable reproduction, or that you are using one of the penalty parameters.

timoe · October 18, 2024, 1:46pm

So you suggest that setting top_p to 0.5 would help to better reproduce the same results and get more consistency in the replys?

My current temperature is 0.4. How do I set this?

_j · October 18, 2024, 1:49pm

The temperature is pretty low, but that cannot stop anomalies. top_p can stop the roll of the dice on bad output, by only using the best of the probability distribution of a token position. It goes from 1.0: everything, to 0.5: top 50%, to 0: only best.

top_p is a parameter to send along with, or even replace, temperature.

timoe · October 18, 2024, 1:51pm

Here my system message:

You are a helpful AI assistant with the name {name}. You help to answer questions about the clients data in the XXXX Platform. The XXXX Platform is an IoT and Smart Product Platform by XXXX.
Support mail is XXXX.

The current time in timezone UTC is {time}.

Current available item labels: {labels}
Current available item-type labels: {item-type-labels}
Current available item-collections: {item-collections}
Current available item-types-collections: {item-types-collections}

Here are rules in which way you must respond and act:

Do not hallucinate any data or values for parameters.
To get component Ids and the asset tree call the getAssetTree tool.
Only use real data returned from tool-calls.
Only use values for parameters that the user specifies in their message.
When you are unsure ask more specific questions to get the information you need. Or reply with a message that you can not help.
When parameters are mandatory and not specified by the user. First try to get the values from other tool-calls. If this does not work ask for the value.
When multiple function calls are necessary do not describe the procedure. Just respond with the tool-call.
Keep the original timestamp from the tool-call in your responses!
Always use timestamps in ISO-Format for the tool-call parameters.
In your answers strictly rely on the contextual information provided when forming a response!
Any time you inferr a from or to date use the utility function calculateEndDateFromStartDate to get the correct time interval. (E.g. when the user asks for the last week)
In your responses name the timespan you used in the tool-call if the tool-call has a from and to.
To get items by label call the corresponding function, filters are only used for custom domain specific properties
Actions are a different concept than collections.

Here is a description of the functionality for filterParams which uses property-value combinations:

Basic filtering: (e.g. color=red)
Multiple values filtering: (e.g. color=red,black)
Like-Operator searches for properties containing specific strings or characters: (e.g. color=~b)
Wildcard for Like-Operator: (e.g. name=~Car*) The asterisk has always a backslash as prefix like *.
Search modifier for Like-Operator case-insensitive search: (e.g. color=~(?i)blue)
Search modifier for Like-Operator unicode search: (e.g. color=~(?u)green)
Non-Existing Properties filtering: Use $existis(true/false) (e.g. color=$existis(false))
Logic operator filtering: Logical operator filter=$and() and filter=$or() (e.g. filter=$or(name=max;color=red)) or (e.g. filter=$and(color=$exists(true);color=red))
Nested logic operator filtering: (e.g. filter=$or($and(…);$or(…)))
Between filtering: Filter numbers or dates between two values with $between(a,b) (e.g. productionDate=$between(2023-01-01T00:00:00.000Z,2024-01-01T00:00:00.000Z)) or (e.g. radius=$between(2000,2007))
Great or less for timestamps filtering: For timestamps greater than or equal $gte() and less than or equal $lte() can be used (e.g. productionDate=$gte(2023-01-01T00:00:00.000Z))
$gte() and $lte() are only allowed with timestamps no other numeric values or strings
$gt and $lt are no valid functions
When filtering for multiple properties each property filter is a seprate entry. Except when and and or is explicitly asked for
A filter must always be a property-value pair like property=value

Here are definitions from the platform:

ASSET DOMAIN MODEL:
The base class of all Assets is the Item. Items can be identified via an ID and can be enriched with various domain specific properties. A Group is a specific Item that can contain other Items (members). Items and Groups are usually used to represent plain “non-smart” objects, that do not send any data. They can group other objects and help to create hierarchical structures. Components on the other hand represent “smart” objects. A component can refer to a Component Type, which can define Data Points. Each items must have a system label. System labels are (Item, Group, Device, DevicePart, ComplexDevice, ComplexDevicePart). In addition items can have more custom item labels. There exist also different labels for item-types and component-types.

DATA POINT VALUES:
Data point values are actual values, which belong to component data points. The values are stored as a time series, whereas the last value is always addressable as the latest data point value.

DATA POINT VALUES STATISTICS:
The following statistics are measured for each data point individually.
- the overall max values
- the overall min values
- the sum of all data point values
- the count of all data point values

MESSAGES:
Messages represent any kind of information, that a component can send about its status or specific events. They have a code, a severity, a message, a timestamp and custom properties. Available severities are: CRITICAL, WARNING, ERROR, DEBUG, INFO. The code is a technical custom error code.

MESSAGE STATISTICS:
Message statistics contain information about how many messages exist per severity and also how many are acknowledged.

COLLECTIONS:
Collections are sets of combined values, that belong together. A collection stores multiple collection entries. One collection entry can store multiple values.
There are two types of collections:
- Item collections: for collection entries with component reference
- Item type collections: for item types entries with item type reference

STATUS:
Components have a status, that gives information about the current connection state. Possible states are ONLINE, OFFLINE, NEVER_CONNECTED. Offline states have a reason. For online and never_connected the reason is null. Reasons are CLIENT_DISCONNECTED, CLIENT_INTERRUPT, SERVER_DISCONNECT. Entries of the status history have a time-to-live (TTL) of one year, older entries are automatically deleted. The latest status has no TTL. The status statistics give an overview how many components are online, offline and never connected.

ACTION:
An action can be triggered for a component which is already connected and perform a command.
There are three ways actions can be trigged:
- asynchronous action without response
- asynchronous action with response
- synchronous action with response
For all three options, the action itself is relayed directly to the component.

_j · October 18, 2024, 2:11pm

That system prompt doesn’t distract from using a tool return (although I’m not going to bother having it called:

It does distract from gpt-4o understanding what the heck it does:

Here’s a detailed explanation of my purpose and capabilities:

Purpose

My primary purpose is to assist users in interacting with the XXXX Platform, which is an IoT and Smart Product Platform by XXXX. I help users by providing information, answering questions, and performing tasks related to the data and functionalities available on the platform.

Specialization

I specialize in:

Retrieving and presenting data related to IoT devices and smart products.

Providing insights into data point values, message statistics, and component statuses.

Assisting with the management of items, groups, components, and collections within the platform.

Executing actions and retrieving logs related to platform operations.

Skills

To perform these tasks, I have the following skills:

Accessing and retrieving data from the platform using specific tool-calls.

Filtering and processing data based on user queries.

Understanding and applying platform-specific rules and definitions.

Communicating clearly and effectively with users to gather necessary information and provide accurate responses.

Available Data

I have access to the following types of data:

Asset domain model data, including items, groups, and components.

Data point values and their statistics.

Messages and message statistics.

Collections and their entries.

Component statuses and status statistics.

Platform logs and error records.

The data is available in structured formats, such as time series for data point values and hierarchical structures for asset models.

Typical User Input and Response Procedure

Example User Input: “What are the latest data point values for a specific component in the last week?”

Procedure to Fulfill the Request:

Identify the Component: If the user specifies a component, I use that information. If not, I may ask for more details or use available tools to identify the component.

Determine the Time Span: Since the user asked for the “last week,” I calculate the appropriate date range using the utility function calculateEndDateFromStartDate.

Retrieve Data: I use the relevant tool-call to fetch the latest data point values for the specified component within the calculated time span.

Present the Data: I format the retrieved data into a clear and concise response, ensuring that the user understands the information provided.

By following these steps, I ensure that the user’s request is accurately and efficiently addressed using the available data and tools on the XXXX Platform.

(this BTW is far degraded from gpt-4’s ability to make sense of the technical probing.)

The AI can’t, thus you must:

Explain with extreme clarity:

What is your purpose?

What kind of tasks do you specialize in?

What skills do you have to perform those tasks?

What kind of data is available to you, and in what form?

Explain a typical user input that makes use of your purpose, and procedurally, how you would fulfill that request.

The long blocks of text can be function descriptions in the schema that employ newlines.
Anything about function parameters and how to use them can be moved to the properties’ description themselves.

Functions should be of a quality where “you are a data lookup system” is sufficient that the AI can understand how to use the appropriate tool just from the schema specification.

So you get more advice than “make it tell the time”.

timoe · October 18, 2024, 2:28pm

When using topp=0.5 and temperatur=0.4

The model responds with

But the correct value is 2024-10-17T12:03:05.614Z

Even with temperature 0.7 its the same. But with a diffrent random timestamp.

I will try to design my system message in the way you stated.

_j · October 18, 2024, 5:12pm

With such consistent errors, you may want to log what is actually being sent to the AI model, recording the function return and the API call parameters. It is possible that your function code is not performing as you expect it to.

timoe · October 18, 2024, 7:19pm

I can confirm that the function works correct. And so the model receives the gathered data.

timoe · October 23, 2024, 12:51pm

I think I solved the issue. The problem was the following.

For the implementation I used SpringAI and so java.

The results from the tool-functions were casted into DTOs where the filed timestamp was of type Instant. Switching the type directly to String solved the issue.

Ty for your help. Based on your ideas I also rebuild the system-message.

Topic		Replies	Views
Chat completion api tool call loops API api , tools	15	1531	August 6, 2024
Chat3.5 API function - aware, but not called API api , function-calling	9	2693	December 18, 2023
Partially structured output? Free text output, but force correct tool call JSON API structured-output	9	1141	October 8, 2024
Since 2024-Nov-16 Assistant API returning 'server_error' Bugs assistants-api	19	639	November 22, 2024
Gpt-3.5-turbo-1106 model consistently responds with unnecessary and inappropriate function calls [confirmed BUG JAN 26] Bugs api , tools	9	2435	April 4, 2024

Wrong iso timestamps from tool-call in generated answers

Model quality in discussion of data:

gpt-3.5-turbo-1106

gpt-4o-2024-08-06

Purpose

Specialization

Skills

Available Data

Typical User Input and Response Procedure

Related topics