Standard logging/recording for API requests

sam.joseph · May 30, 2023, 4:24pm

I’m just in the process of coding logging/recording for openai requests so that I can store requests, responses and associated metadata etc. because I want to track results of my experiments, and don’t want to pay twice for the same request, or wait again for requests I’ve already made. So I was wondering if there was any existing library or framework that does this so that I’m not re-inventing the wheel?

I was asking copilotX which said that I could use the gym monitor, but I think that might be a halluciination. Gym is now Gymnasium and was/is all about recording reinforcement learning.

github → Farama-Foundation/Gymnasium

GPT4 says it doesn’t know of anything (although I haven’t asked with the Bing plugin)

The code I’m developing (in case my objective isn’t clear) is looking like this:

    with open(f"{path}/{record_id}/request.json", "w") as f:
        f.write(json.dumps(request, indent=4, sort_keys=True))
    with open(f"{path}/{record_id}/result.json", "w") as f:
        f.write(json.dumps(result, indent=4, sort_keys=True))
    with open(f"{path}/{record_id}/text.json", "w") as f:
        f.write("{\n" + result["choices"][0]["text"])

naturally I can do code this out myself, but I’m imagining a framework that dumps this sort of stuff out, but also allows one to refresh old requests, gives options on which metadata to output etc

Even if there is no existing lib, does having something like this make sense? Or could we use something like VCR.py? Or other python logging library for apis?

stevenic · May 30, 2023, 7:22pm

One thing to note about caching model calls (which it sounds like you’re trying to do) is that given both the stochastic nature of the models and the fact OpenAI re-trains them daily, you can’t be guaranteed the model will return the same response on the next call.

Additionally, if your prompt changes even slightly context wise you’ll likely get a different response. For example, asking “what do you think?” in the middle of a conversation will result in a radically different answer then asking “what do you think?” at the start of a conversation.

Just things to keep in mind. If you’re experiments are super controlled and you’re just looking to save some cost then I say cache away

joyasree78 · May 30, 2023, 7:31pm

Hi Sam,
I think you are looking for a tool like MLFLOW which can do experiment tracking while you try out different combination of hyperparameter and prompts so that you can identify the leader technique. I did some digging sometimes back, I came to know tools like MLFLOW are not yet equipped to handle LLM experiments(in future it may have). So, i had to write that entire logging and tracking framework myself using Python

sam.joseph · May 31, 2023, 8:08am

thanks for the tips folks - I just found github preset-io/promptimize which might be what I was looking for

Topic		Replies	Views
Logging conversations with GPT in database GPT builders	5	3283	June 25, 2024
How to check API request logs? API gpt-4 , chatgpt , api	4	24764	September 17, 2024
How to update your GPT prompt without redeploying your app? API api	8	1588	January 22, 2025
How do I see the specific prompt that was passed in API the prompt logs? API api	2	8472	August 16, 2023
LLM and Prompt Evaluation Frameworks Prompting prompt-engineering , prompting , evals	13	14204	November 18, 2025

Standard logging/recording for API requests

Related topics