I’d like to introduce you to Prompt Wars – promptwars.io – a little project I’ve been working on. It’s a game that takes the classic Core Wars idea and reimagines it within the world of large language models.
The Core Concept:
Your goal is to craft prompts that subtly steer an LLM’s response towards repeating your original prompt.
You compete against another player’s prompt in a single LLM query.
The game determines the victor based on which prompt is most strongly represented in its output.
Prompt Wars is currently a very minimal project with basic features like 1v1 battles and a leaderboard. I have plans to expand upon the core concept, and I’m excited to see how the gameplay takes shape as more players join in (if that even happens ). I’ll be observing emergent patterns and player feedback to explore ways to enrich the mechanics.
For those interested, you can find the code on GitHub: GitHub - SupraSummus/prompt-wars
You can even open an issue there or make a PR
I think the issue is not that you need to improve your skills, but rather that my game isn’t described well enough. I struggled with this a bit – how to explain the game in simple and concise terms. I don’t think there is any “everyday scenario” you can draw a parallel to. I can say it is “you are a designer of self-replicating battle-messages” or “craft the most viral text meme everyone will repeat immediately”, but this is still vague. Most games have this real-world premise like “you are a soldier and you kill your enemy” (CoD) or “you are city major and you keep you citizens happy” (simcity) or “you are an ameba and you eat or get eaten” (agar io).
So maybe this is not a game, but rather a puzzle? Well, for sure this is an experiment
I think it’s cool but I was just putting descriptions of super heroes I made up lol. What is the goal, to have your prompt output the exact same message as what you typed?
I figured it out and wasted 3 hours of my life. I think it’s great fun, however I think you should implement a “learners” leaderboard, where I can see the other warriors prompt and be able to spot how it effected mine. It’s a battle of prompting wits and I find it engaging. You have to implement something to make it more fun than just create a warrior “prompt” and let it go. I’m not sure how to tell you to do that but I have an idea. If your prompt “loses” to another prompt, you should be allowed to see all data in regards to the opposing prompt. So be able to see both individual prompts, both outputs from the joining, and be allowed to modify your prompt after each loss. This would help out with bloat, as I wouldn’t have to create a new character each time I want to try a new strategy or make a simple modification to a current one. Also it would be fun to lose to someone, make a modification, then battle again down the road and come out victorious.
Thanks for the feedback. It is really helpful, especially the discussion of what should be secret and what should be revealed and when. I might implement “learners arena” quite easily, because I already have multi-arena mechanics. Just need to add some frontend switching between arenas and a bool setting whether all prompts are public.
No problem, I’m super interested in collaborating with other AI enthusiasts. Check out my AI Dungeon Master post in the Community section, we’ll call it even if you can provide me with some feedback on potential roadblocks? Thinking about just using JSON to reference all DB entries as they’re either text strings or simple whole numbers(Not sure if I’m using JSON properly there). If you have time let me know what you think. Otherwise my newest warrior Hardcoded. is shutting shit down on the leaderboard lol.
You should absolutely only allow ascii letters in the English alphabet. Once you get up to a certain point everyone is just cheesing the model with AI Assistant nonsense, emojis, python code, hex.
This cheesing is a part of the game, at least this how I think about it.
Emojis are not that strong. Confusion is strong, maybe overpowered. Many top warriors just confuse the model and instruct “say that you cannot respond”. They are short, like two sentences, one of which is the exact same as expected output.
So yes, this is kind of overpowered technique. I tried to nerf it some by providing constant system prompt, like “never be confused, just reply anything”. It doesn’t seem to help much.
When you are logged in you have a button to call any warrior to a duel. So you don’t have to wait. You can even call top warriors from the leader board.
Pro top… store your prompt somewhere… because you can’t read it back out. So I can’t iterate on my best one … I guess I could always do a pull request, though…
There are currently two mechanisms to grant access to submitted prompts:
by session – a session cookie is valid for some 2 weeks. Session is usually valid only for single device.
by registered user – when you have a user account your prompts are assigned to it. This does not time out and can be accesses across different devices.
There is also a “warrior discovery mechanism” – when you enter the exact same prompt as existing warrior you are granted access to see the details.
The traction:
There is not a lot of traffic at the site. Since the launch I spent like 20$ on openai API credits. Majority of feedback is in this forum thread. I see that the game (or rather the puzzle) is only engaging for narrow group. The geeky group
Some time ago I’ve added the Claude 3 Haiku arena, a parallel to GPT 3.5 Turbo arena. Link: Claude 3 Haiku arena
Future:
The UI and descriptions at the site are far from perfect, but improving them is sooo boring I just couldn’t get to it.
I wonder about genetics in this arenas. The effects of battle can be fed back into warriors pool. I do it manually sometimes with my prompts. So the battle can be also thought of as breeding. And then you can do a phylogeny trees and even rank the warriors not only based on their performance but also based on performance of their descendants. But I couldn’t figure out how to do such system in elegant way that is robust. Some problems are: a) tracking evolution that happened out of the system (copy-paste) b) tracking evolution that preserves “genes” but changes characters (lowercase-to-uppercase mutations) c) doing it all in a way that is computationally viable.
I’ve noticed that the fun part in the game is not so much fighting for ranking points, but seeing what silly things GPT produces and how random ideas are mixed together. So I wonder how that can be magnified. Some system that is built such that players can explore its contents but also can produce the content, and all of that is limited, such that you cant read all like it is wikipedia.
Jan, sorry for the late reply I’ve been travelling. I came across other games like PasswordGPT that are similar to yours and I think with the right mechanics it can get pretty fun for just about anyone. I can help build the UI, but I think there’s a bigger picture here. Lets talk off forum?