Hi everyone. I am building an application and want to use gpt as a calendar assistant so the user can type questions and retrieve the relevant events from their calendar. Here is my process so far:
- Calendar events have embeddings created and stored in Pinecone on an ongoing basis. For now, the text I use for embeddings is something like “date: 2023-04-03. Event name: Haircut. Time: 17:00.”
- User types a question which is sent to the server (AWS Lambda). For example: “When is my next haircut?” or “What do I have this weekend?”
- Call /chat/completions with gpt-3.5-turbo model to infer the action (“Add event”, “find event”, etc.). This is working great so far.
- Assuming it is a “Find Event” action, embed the user question and query Pinecone to get the top 20 results.
- Look up the resulting calendar event records in my database.
- Feed the results into a gpt (/chat/completions with gpt-3.5-turbo model) prompt to get an answer to send back to the user. The prompt I use is
You are a helpful calendar assistant, and you format dates like MMM DD, YYYY, and today is ${today}. Calendar events:\n${events}\nQuestion: ${question}
;
I’m observing a couple issues with this.
First, the embedding lookup is not that reliable in finding the right events. For example, for the haircut question, it returns a bunch of other seemingly unrelated events and doesn’t return the Haircut event at all, even though the keyword is right in there. So, I researched a bit and tried the hypothetical approach where I create a hypothetical event that answers the user question and feed that to the Pinecone query, but it didn’t help. Next, I tried asking gpt to infer any specific dates and a subject from the user question, so I could separate those out in my Pinecone query. The inference worked, but overall it didn’t improve the embedding query. Finally, I am using the inferred dates and subject to do a keyword and date search of the calendar events directly from my database and then combining those results with the Pinecone results. This finally at least gets the right list of calendar events to feed to gpt, but at the cost of a lot more overhead and processing time.
Second, I am now seeing that the gpt answer in the last step does not always understand the timeframe being asked about. For example, with the question “What do I have this weekend?”, it parses out the correct dates in the intermediary gpt call used to search for events, but in the final answer after feeding the calendar events in the prompt it does not pick out the events corresponding to “this weekend” and instead says there are no events for this weekend (sometimes it works, sometimes it doesn’t).
So far I am testing simple questions and calendar events, and at this point I’m worried this functionality may not be consistent enough to continue building out and releasing to customers unless the accuracy can be improved.
Do I need to explore fine tuning at this point to do more training about common calendar/date range requests? From what I’ve read so far on the forums here and other places it seems there’s a general attitude that fine-tuning is not worth the trade-off in terms of using a less powerful model as well as the added cost (and effort required to do the tuning - I am just me for this app and don’t have a team or anything).
Are there other approaches I can use to improve accuracy for this type of application?
Thanks for any insight you all can give!