Hi all,
I work for an AI powered app for sport coaching and I also faced dates issues.
The problem is the following: from a query in natural language, return all dates that are mentioned.
In my context, the user enter various time references:
- The universal ones: on 8th October, tomorrow, next Monday, 04/12…
- The ones that need internal data, referring to the athlete’s schedule: two days after my next race, until the end of the season…
In addition, I have to handle different dates format: 04/05 can be the 4th of May or the 5th of April, depending on the athlete’s locale.
Finally, some queries must be understood as a continuous period of time, with a start date and an end date: “until monday”.
Other queries are just a list of dates: “today and monday”
I did several tests using parsedatetime, PySpark NLP and scacy nlp, regex: all failed in some cases.
Also giving some explicit dates to chatGPT 3.5 did not really help as it does not compute anything. In some cases “in two days” outputs a wrong date, even with today’s date as a context.
So I wanted to share the technique I finally chose to achieve the task.
The first step is to use the LLM to transform the query into a structured output that expresses the time references as a simple arithmetic formula that involves supposedly known events. See the examples below to understand.
The second step is to process this output after replacing all events by the real dates.
STEP 1
Here is the prompt I use:
"""
Transform the sentence into a JSON that I can use to compute the exact dates.
You can use any of the following time references that I know the dates of:
- TODAY: current date
- TOMORROW: tomorrow's date
- END_OF_MONTH: the last day of the current month
- MONDAY, TUESDAY, WEDNESDAY, THURSDAY, FRIDAY, SATURDAY, SUNDAY: the days of the week
- BASE_START, BUILD_START, PEAK_START, TAPER_START, RECOVERY_START: the start date of the base, build, peak, taper and recovery phases respectively
- BASE_END, BUILD_END, PEAK_END, TAPER_END, RECOVERY_END: the end date of the base, build, peak, taper and recovery phases respectively
- RACE, GOAL: the date of the next race or goal respectively
- END_OF_SEASON: the day the current sport season ends
If the sentence mentions a period of consecutive days, the output must be of the form:
{"start": start date, "end": end date}
If the sentence mentions one or several dates, the output must be of the form:
{"list": the list of dates}
The dates you give MUST respect the date format "YYYY-MM-DD".
If you can't compute the dates, express them with the simplest arithmetic formula involving the previous time references.
For your calculations, the unit is one day. Also, consider that a week contains 7 days, and a month contains 30 days.
Some dates may be explicitly given in the __MESSAGE_DATE_FORMAT__.
If no event are explicitly mentioned or if you can't extract any date, return
{"error": true}
Today's date: __TODAY__.
Examples:
- "new workout" -> {"error": true}
- "new workout today" -> {"list": ["TODAY"]}
- "add workout in 4 days" -> {"list": ["TODAY + 4"]}
- "replan until the end of the week" -> {"start": "TODAY", "end": "SUNDAY"}
- "on April 10th and also 15th" -> {"list": ["2024-04-10", "2024-04-15"]}
- "the next two weeks" -> {"start": "TODAY", "end": "TODAY + 14"]}
- "until 4 days after next monday" -> {"start": "TODAY", "end": "MONDAY + 4"}
- "replan on 10 07 2024 and 20th of July" -> {"list": ["__EXAMPLE_2024-07-10__", "2024-07-20"]}
- "replan until 10 07 2024" -> {"start": "TODAY", "end": "__EXAMPLE_2024-07-10__"}
- "replan until 10 07" -> {"start": "TODAY", "end": "__EXAMPLE_2024-07-10__"}
- "days off on 30th of April" -> {"list": ["2024-04-30"]}
- "from sunday to next wednesday" -> {"start": "SUNDAY", "end": "WEDNESDAY + 7"}
- "i need to take days off" -> {"error": true}
- "2 days after my next build phase" -> {"list": ["BUILD + 2"]}
- "one week before my next race" -> {"list": ["RACE - 7"]}
Context:
__CONTEXT__
The sentence comes from someone living in __ORIGIN__.
"""
Then replace the following keywords:
- __MESSAGE_DATE_FORMAT__: "US date format" if the user is from the US, else "European date format" (to avoid confusion for the 04/05 example).
- __TODAY__: today's date, with the name of the day
- __EXAMPLE_2024-07-10__: "2024-10-07" if the user is from the US, else "2024-07-10" (this is to make the LLM understand how such date must be understood)
- __ORIGIN__: "the US", for US user, else "Europe".
-__CONTEXT__: some other context if you have one.
STEP 2
Get the output from STEP 1 and replace all keywords (TODAY, RACE, …) with the real dates in the format YYYY-MM-DD.
Then call the LLM with the result and the following prompt (ask also to return JSON output):
"""Here is a simple aritmetic formula to compute a date. All given dates are in the format YYYY-MM-DD. Return a valid JSON of the form {"date": the computed date in the format YYYY-MM-DD}"""