Improving accuracy of Calendar prompt understanding

Hi everyone. I am building an application and want to use gpt as a calendar assistant so the user can type questions and retrieve the relevant events from their calendar. Here is my process so far:

  1. Calendar events have embeddings created and stored in Pinecone on an ongoing basis. For now, the text I use for embeddings is something like “date: 2023-04-03. Event name: Haircut. Time: 17:00.”
  2. User types a question which is sent to the server (AWS Lambda). For example: “When is my next haircut?” or “What do I have this weekend?”
  3. Call /chat/completions with gpt-3.5-turbo model to infer the action (“Add event”, “find event”, etc.). This is working great so far.
  4. Assuming it is a “Find Event” action, embed the user question and query Pinecone to get the top 20 results.
  5. Look up the resulting calendar event records in my database.
  6. Feed the results into a gpt (/chat/completions with gpt-3.5-turbo model) prompt to get an answer to send back to the user. The prompt I use is You are a helpful calendar assistant, and you format dates like MMM DD, YYYY, and today is ${today}. Calendar events:\n${events}\nQuestion: ${question};

I’m observing a couple issues with this.
First, the embedding lookup is not that reliable in finding the right events. For example, for the haircut question, it returns a bunch of other seemingly unrelated events and doesn’t return the Haircut event at all, even though the keyword is right in there. So, I researched a bit and tried the hypothetical approach where I create a hypothetical event that answers the user question and feed that to the Pinecone query, but it didn’t help. Next, I tried asking gpt to infer any specific dates and a subject from the user question, so I could separate those out in my Pinecone query. The inference worked, but overall it didn’t improve the embedding query. Finally, I am using the inferred dates and subject to do a keyword and date search of the calendar events directly from my database and then combining those results with the Pinecone results. This finally at least gets the right list of calendar events to feed to gpt, but at the cost of a lot more overhead and processing time.

Second, I am now seeing that the gpt answer in the last step does not always understand the timeframe being asked about. For example, with the question “What do I have this weekend?”, it parses out the correct dates in the intermediary gpt call used to search for events, but in the final answer after feeding the calendar events in the prompt it does not pick out the events corresponding to “this weekend” and instead says there are no events for this weekend (sometimes it works, sometimes it doesn’t).

So far I am testing simple questions and calendar events, and at this point I’m worried this functionality may not be consistent enough to continue building out and releasing to customers unless the accuracy can be improved.

Do I need to explore fine tuning at this point to do more training about common calendar/date range requests? From what I’ve read so far on the forums here and other places it seems there’s a general attitude that fine-tuning is not worth the trade-off in terms of using a less powerful model as well as the added cost (and effort required to do the tuning - I am just me for this app and don’t have a team or anything).

Are there other approaches I can use to improve accuracy for this type of application?

Thanks for any insight you all can give!

1 Like

I think using embeddings might not be the best fit for this kind of structured data. You might wanna try a traditional database on the server to store and retrieve data, and get ChatGPT to come up with useful queries based on the user request. Essentially your code is a REST interface to the database, and ChatGPT is a translator to translate user requests from natural language to database queries. Would that solve your problem?

1 Like

@creatiCode, but still, a similarity search would be needed, right? because one column would have text, e.g. “get a hair cut” or just a key word search would suffice… maybe for simple queries worked well, but for more complex that the user don’t use the right key words a similarity would be needed. What you think

That’s true. So if the user queries “when is my next haircut?”, you need to make sure ChatGPT writes a query for all future tasks. I assume it won’t be too many. Then you can feed all these data back to ChatGPT, and ask it to figure out which one has the correct date.

For example, this is what I just tried:

image

If there are indeed too many tasks returned from the database query to pass all of them to ChatGPT direclty, you can still use semantic search to filter down these tasks first.

Oops, I just saw this post.

I’m seeing much the same results as you.

In fact, I struggle to get coherent results out of GPT3.5-turbo at all, sometimes. How did your bot fare in this situation?

I was going to respond back here once I had it figured out for myself, but although I am getting closer I feel like every time I test a new use case I find something it fails at. My general approach has been refined to be the following steps now (thanks for the advice @joao.b and @CreatiCode!):

  1. Ask gpt to analyze the given input prompt for metadata that I then use to query my database. The current iteration of my prompt for this is:
const prompt = `Today is ${today} and timezone is ${tz}. Analyze the question and return a json object only, with structure:
{
 "action": "ADD EVENT","REMIND","TODO","MODIFY EVENT","DELETE EVENT","CANCEL EVENT","COPY EVENT","FIND EVENT", or "UNKNOWN"
 "isoDates": [],
 "newIsoDates": [],
 "subject": string,
 "isFutureEvents": true/false,
 "isPastEvents": true/false
}

List every possible date in the range implied by the question needed to find the referenced event(s), or an empty array.
Question about new or existing calendar events: ${input}`

I find I am constantly tweaking this text as I try new use cases and find things it doesn’t handle well.

  1. Embed the analyzed “subject” terms and query Pinecone for semantic search. At the same time, construct a query into my own regular DB using the dates and timeframes from the output of step 1, and perform a keyword search through each event returned.
  2. Once I have a list of possible events from step 2, run another gpt chat completion with the following prompt:
const prompt = `You are a helpful calendar assistant, and you format dates like MMM DD, YYYY, and today is ${today}, and use relative dates/times in your answer if applicable. Respond in json format { "response": text, "eventIds": [] }. You have already done a search for calendar events finding these:\n${occurrences}\nQuestion: ${input}`
  1. Return the response text to the user, and display the list of events from step 2 as well.

While it is starting to get more things right, I feel like there are too many times where it is flat out wrong (or sometimes makes up events out of thin air). Reducing the input calendar events to the final question where they are already pre-filtered by date or keyword has helped the most, although the trade-off is my responses come back after like 8 seconds which I feel is too long. Each gpt or embed call is averaging about 2 seconds, and then the queries to my own database is another few seconds - so it all adds up. I have doubts this is feasible as production functionality, but I am forging ahead regardless trying to solve each problem as it comes. When I do release to my users, I think I will need to caveat with “beta” and set expectations so that users don’t expect this to work great every time (and if it doesn’t work every time, who would want to use it??).

@pchan, my final prompt for your scenario works as expected:

You are a helpful calendar assistant, and you format dates like MMM DD, YYYY, and today is 2023-04-23T20:09:49.850+00:00, and use relative dates/times in your answer if applicable. Respond in json format { "response": text, "eventIds": [] }. You have already done a search for calendar events finding these:
Meet with Sam on 2023-04-25T00:00:00 with id 2023-04-25T00:00:00::bcf2a650-e039-11ed-b4fb-9530538bff0c, Tomatos on 2023-04-29T00:00:00 with id 2023-04-29T00:00:00::09bf7c40-e154-11ed-9639-47da7ae1cd1c
Question: What’s on my calendar for next week?

response:

{
    "response": "You have 2 events next week:\n- Meet with Sam on Apr 25, 2023\n- Tomatos on Apr 29, 2023",
    "eventIds": [
        "2023-04-25T00:00:00::bcf2a650-e039-11ed-b4fb-9530538bff0c",
        "2023-04-29T00:00:00::09bf7c40-e154-11ed-9639-47da7ae1cd1c"
    ]
}

Also note, I have not even looked at optimizing my calls for tokens/cost yet, so not even sure all this will be cost effective once it is working consistently enough. One step at a time!

1 Like

@duggster Thanks for sharing your progress. This is indeed a complex task that appears simple.

In step 2, it is not clear to me why you need to query both Pinecone and the regular DB. If I understand correctly, suppose the user question is “When will be my next haircut?”, the “subject” would be “haircut”, right? So you only need to search in the regular DB for any entry that mentions haircut within the date range?

Can you give an example of why that’s needed?

@CreatiCode well, that’s a good question. I started down the road with embeddings and Pinecone after seeing that approach recommended for other use cases around “how to use chat gpt with my own data?” My intention was to use it for semantic search, so like if the user used the word “barber” instead of “haircut”. With my more simple testing so far I admit I haven’t seen the Pinecone query hit that often, and usually the keyword query returns the most relevant events. So at the moment I’m assuming the semantic search just helps covers more bases and helps bring in more possible events into play that gpt can pick from based on inexact queries from the user.

I do see several downsides (or assumptions of downsides? Still learning) to using Pinecone for this:

  • Event names are typically very short, and I’m wondering if the similarity won’t get picked up on as much with so few words
  • After every time a user adds, modifies, or deletes an event, I would need to re-embed all events for that user. This can happen frequently, so I’m a little worried about the efficiency of doing something like this on an ongoing basis. I assumed I might re-embed just the modified events in real time, and then re-embed all events for the user like once a day. Not sure yet what the best approach to this would be?
  • Introducing Pinecone into my architecture is a whole other component to deal with. Although Pinecone has a generous free tier and there’s no infrastructure to manage which is nice, I’m still paying OpenAI for all the embeddings calls which I believe will probably add up quickly in a production environment. It would certainly simplify my architecture and reduce cost/time to not use it.

What other points could help me understand if Pinecone/semantic search is recommended/not recommended for my use case?

Thanks for the explanation. It would certainly simplify the process if you don’t use Pinecone. The question is if you do not use Pinecone at all, what user inputs are not handled correctly? Maybe list some examples?

I would like to share my experience regarding adding dates in a specific format while creating an embedding. By including the date format, such as “Haircut on 1 March 2023,” each word is stored as a keyword in the embedding. This enables relevant records to be pulled when conducting a search query for specific events, such as “Find Haircut events on March 2023,” “Find Haircut events on 1 March 2023,” or “Find Haircut events in 2023.” As a result, querying the database for selecting a date range can be avoided.

That’s sounds promising. Would that also be able to solve issues with, say, “Find Hair cut events close to March 2023” when the event is in April?