Tools for testing prompts

Hello everyone,

I have been working on prompts for some time now. We are building a summarization feature, and to achieve the required accuracy, we have tried multiple variations of the same prompt. However, the testing process has been cumbersome, and we have been directly calling the API to test it out and sharing the prompts via a text file on GitHub.

Is there any tool that can help us maintain the versions of the prompts we use, like a “CodePen for prompts”?


1 Like

I asked ChatGPT4 about this:


Indeed, there are tools available that could help streamline your process for maintaining and versioning your prompts. Here are a few suggestions:

  1. Git and GitHub: You’re already using GitHub, which is an excellent start. Git is perfect for version control of text files. You could create a more organized repository structure for different prompt versions and use branches for developing new prompts before merging them into the main codebase. Combine this with a markdown file or use GitHub’s wiki feature for documentation to track changes over time.

  2. Google Colab: Google Colab is a great tool for this purpose, especially if you’re doing a lot of Python programming. You can save your prompts and the code you’re using to test them all in one place, and it’s easy to roll back to previous versions. Plus, Colab notebooks are easy to share and collaborate on.

  3. Jupyter Notebook: Similar to Google Colab but more local-based, Jupyter notebooks are great for prototyping and testing code. You can save your prompts as markdown cells and have your code in code cells.

  4. DVC (Data Version Control): This tool is designed specifically for versioning data and machine learning models, but it could also be used for your prompts. It integrates well with Git.

  5. Prompt Management Tool: If none of these solutions meet your needs, you might consider developing a custom “Prompt Management Tool”. This could be a simple web application that allows you to input, version, and test prompts. You might also add features for rating prompt performance or grouping prompts by project or theme.

Remember to keep the prompts and any data confidential and secure, especially when using public tools and repositories.

I hope you find these suggestions helpful for your project.

1 Like

Here’s a tool that might be good for various prompts, but it really doesn’t have good code-like version control or forking abilities built in.

Open source
Native application
Cross platform
can be local DB, or client server, or shared DB
Different note view types, graphical relationships of trees and links
Can launch documents to apps

Poor wysiwyg, wordpad beats it
Just give some friggin folders
Everything, even program options is a note

Everything in the tree is a “note” of various types, not folders
Child objects get added to the parent note view as a sort of in-note panel
Doesn’t use system fonts
Gonna take at least an hour to orient and discover what’s going on in this thing

Bad for code:
“snapshots” are only done automatically, only good for reverting
You can clone/copy notes, but have to make your own tree
Barely better than saving file versions

Might be good for the "paste different things into ChatGPT crowd


Hey @pranavr That’s exactly why I built Knit, it’s a management and development tool for prompt designers and teams, with all the functions you mentioned. Also it’s free for everyone. Lmk if you need any help.


@pranavr You could also check out LangBear! It keeps version of prompts, track&test performance of prompts.

1 Like

Many prompt engineering IDE’s available

1 Like

Check out Weave

Its part of the Weights&Biases stable

Hey, Promptotype should help with this.
I’m the creator so feel free to ask questions/ send feedback!

1 Like

Let’s try promptfoo is a CLI and library for evaluating LLM output quality.

Have ou seen Spellchain? This should be useful.

I’m just keeping my stuff organized in obsidian for now because it’s just markdown files in a physical folder structure under the hood, which means I can migrate to another tool later on, use git to track my changes and add my own scripting on top of what I have now easily in the future.

I feel as if the state of the art is going to evolve quite quickly so I’m trying to stay as far away from tool/vendor lock in as I can for the time being so that I can adjust easily as things change.

This application allows you to evaluate the performance of multiple models, different Large Language Models (LLMs) on your prompt.

1 Like

Take a look at PROMPTMETHEUS. It’s a full-fledged IDE for prompt design and testing, including composability, versioning, statistics, and collaboration for teams. It also connects to all other major LLM providers for cross-platform testing. If there’s anything missing that you need, please let me know so that I can add it.

Check out Shiro its a dev platform for prompt engineering. You can test variations on prompts against all the major LLM providers. You can pass variables into your prompts and include complex logic like conditionals (if/else), for loops and even operations like capitalize.

Once you are ready to use a prompt in production you can deploy it so you can access it and use it via API.