I read the docs about evals and graders and at least IMHO it looks like it is deprecated or not accurate for Responses API.
I found the idea of evals very useful for my use case but I have this problems/questions and I couldn’t find answers after two days experimenting an searching:
- Is there a way to pass a vector_store_id to a Responses API eval run and then check (with a grader) if the tool file_search was called?
- The doc says that you can use
{{sample.output_json}}
or{{sample.output_tools}}
for accessing the JSON response or the tools used by the model, but it’s impossible due to UI limitations when creating the Evaluations using the Dashboard. You can only use{{ sample.output_text }}
- Is there a way to debug what is the content of the
{{ sample }}
item? Because it would be really helpful.
Any information on this guys will be very helpful, I have been testing for two days and I feel I’m blocked.
Thanks!