Anyone else experimenting with LLM chaining or fallback flows?

Not sure if this is too niche, but I’ve been messing around with combining outputs from different models—like using GPT-4o for initial context building and switching to Claude for longer-form coherence.
Mostly just a personal project right now, but I’m curious if others have tried something like this?
Like… how do you handle consistency across model responses? Feels like prompt bleed becomes an issue pretty fast.

Also wondering if anyone’s tested Grok in this kind of setup yet—does it play nice with external wrappers or is it more rigid?

Anyway, not building anything serious yet—mostly weekend tinkering. Just wanted to hear other people’s thoughts or pitfalls you’ve run into.