Can we expect visual grounding for references cited by Assistants API, or visual grounding more generally?