Best practice for multi-step forms with mixed UI + chat input? widgetState not always picked up by model

I’m building an Apps SDK widget that renders a multi-step form. The form is pre-filled via the tool inputSchema. Then, users can progressively fill the rest of the fields either by directly editing fields in the widget UI or by typing in chat (which re-triggers the tool call with updated pre-filled data).

To preserve and expose UI-entered values to the model across minimize/restore and follow-up turns, the widget writes the current form state with window.openai.setWidgetState(...). On each tool call, I try to accumulate state by merging the latest user message with any existing state (including values the user entered in the UI). I also include guidance in the tool description telling the model to merge state across turns rather than overwriting.

In practice, I sometimes see the model re-call the tool without incorporating the latest UI-entered updates, even though the docs indicate widget state should be accessible to the model.

One complication is that I don’t have user authentication, so if I try to persist “draft form state” in the backend, I’m not sure how to reliably associate a draft with the correct conversation/session across tool calls and new message turns.

What’s the recommended best-practice architecture for managing multi-step form state when users can fill it through both UI interactions and chat messages, while keeping state accumulation reliable across follow-up turns?

Also, is there any reliable conversation/session identifier exposed by the Apps SDK / host that I can use (without auth) to consistently match backend drafts to the right conversation?

3 Likes

Have you tried widgetSessionId for association (from Build your ChatGPT UI).

I have a similar situation - I’m working on an app that essentially has a list view, then a multi-step submission process.

I’m not doing conversational “updates” of my widget state, so right now the conversation serves to set the initial state only, then augmented by what the user does in the UI. I haven’t done additional updates from chat because i don’t think its how it’ll go… but if it does, i’ll need to wire up some clobber/merge logic like you’re dealing with.

I looked into widgetSessionId, but unfortunately it won’t help for server-side association. The widgetSessionId is regenerated / different for each new widget that gets rendered from a new tool call.