Questions on a two agent chain?

Wondering if any of the smart folks here have some pointers on something I’m trying to do. I wanted to see if I could implement a two agent chain where GPT is one agent and blip to is another agent. The task is essentially a multi-modal visual design. GPT can ask questions about a picture, and blip to can answer them. GPT can then make suggestions, the image changes, and then blip to describes the image. Image. This iterates until GPT is satisfied to achieve some goal.

Any examples of chaining these two things together?