Seeking Efficient Method for Debugging Programs Using GPT API Outputs

I’m currently working on a Python project that integrates the GPT completion API, but I’m facing a challenge with the debugging process. I’m primarily using a library like Chainlit, the nature of my project makes it unsuitable for a Jupyter Notebook environment,

Issue: Each time I debug the program, I need to start from the beginning, which involves waiting for new outputs from the GPT API. This not only consumes time but also incurs additional costs. A significant concern is the variability in GPT’s outputs, making it difficult to reproduce and pinpoint bugs consistently.

Question: Is there an elegant method or a Python library that allows me to switch between real GPT API outputs and a mock API? Specifically, I’m looking for a way to replay former outputs from the GPT API during the debugging process. This functionality would greatly streamline my debugging workflow and help in consistently reproducing issues.

I’m actually trying to save exactly this problem with this library: GitHub - TonySimonovsky/AIConversationFlow: AI Conversation Flow provides a framework for managing complex non-linear LLM conversation flows, that are composable, controllable and easily testable.. Though from a slightly different perspective - by being able to test separate steps of the process separately.

Thanks for your help. Although this may not exactly be what I want, it may be very helpful for a well-modulated and well-structured program. I may import this library for a try.

Right now it is a constructor of macroflows, but within the next few days I plan to add methods to do auto-testing of separate microflows.