API call to generate an image, using an input image and text (no mask)?

I am trying to recreate, using the API, the following prompt:

When I inspect the network request, it appears to be a normal /conversation/ request, however when I use the API to do this it will return only text, rather than generating an image.

I also tried using the Image Generation API, which accepts an image, but only with a mask applied to it, which wont create the same output as seen in this image.

Does anyone know if its possible in the API to replicate this request + response?