Computer use / Operator input

I am using the computer-use preview model in a browser context in conjunction with playwright.

I observed two use-cases which cannot be covered well with the current computer use model api.

  1. If the action (e.g. clicking playwright locator) fails, it would be great to provide an additional text output in the ComputerCallOutput object, similar to the FunctionCallOutput. This would also enable other enrichments like providing the current url after an action and more.

  2. I am using the computer ComputerTool in combination with other Tools. For example for additional actions like creating assertions on a website. This works well, but there are cases where I want to disable the computer use tool because I want to force it to use one of the other tools next.
    This however invalidates the existing message stack since the computer use model, without the computer use tool, can not process computer call items or even images in its message stack. It would be great to be able to use the same message stack when “removing” the computer-preview tool.