I’m building an Apps SDK widget that renders a multi-step form. The form is pre-filled via the tool inputSchema. Then, users can progressively fill the rest of the fields either by directly editing fields in the widget UI or by typing in chat (which re-triggers the tool call with updated pre-filled data).
To preserve and expose UI-entered values to the model across minimize/restore and follow-up turns, the widget writes the current form state with window.openai.setWidgetState(...). On each tool call, I try to accumulate state by merging the latest user message with any existing state (including values the user entered in the UI). I also include guidance in the tool description telling the model to merge state across turns rather than overwriting.
In practice, I sometimes see the model re-call the tool without incorporating the latest UI-entered updates, even though the docs indicate widget state should be accessible to the model.
One complication is that I don’t have user authentication, so if I try to persist “draft form state” in the backend, I’m not sure how to reliably associate a draft with the correct conversation/session across tool calls and new message turns.
What’s the recommended best-practice architecture for managing multi-step form state when users can fill it through both UI interactions and chat messages, while keeping state accumulation reliable across follow-up turns?
Also, is there any reliable conversation/session identifier exposed by the Apps SDK / host that I can use (without auth) to consistently match backend drafts to the right conversation?