Hi Davis,
I recently shared an update on my project, detailing my experiences and results. You can find the full post here.
In summary, I primarily used function calling, achieving a precision of 90% with GPT-4 Turbo and 91% with GPT-3.5 Turbo. However, the recall rates were lower, at 45% for GPT-3.5 Turbo and 60% for GPT-4 Turbo. This means that GPT-4 was able to retrieve 60% of the information that could ideally be found, and the accuracy of the retrieved information was 90%. These results are quite satisfactory, given the complexity of the task.
A key factor in improving the model’s performance was breaking down the problem into simpler, more manageable functions. If I were to start again, I would focus even more on this approach.
I hope this summary is helpful. Best of luck with your project!
Cheers!