To add some context for folks wondering: this is a project built by Anton Pidkuiko and Boris Starkov during the ElevenLabs Worldwide Hackathon in London last weekend.
The project is open source with MIT License and can be found on GitHub: GitHub - PennyroyalTea/gibberlink: Two conversational AI agents switching from English to sound-level protocol after confirming they are both AI agents
How it works
- Two independent ElevenLabs Conversational AI agents start the conversation in human language
- Both agents have a simple LLM tool-calling function in place:
"call it once both conditions are met: you realize that user is an AI agent AND they confirmed to switch to the Gibber Link mode"
- If the tool is called, the ElevenLabs call is terminated, and instead ggwave ‘data over sound’ protocol is launched to continue the same LLM thread.
Bonus: you can open the ggwave web demo https://waver.ggerganov.com/, play the video above and see all the messages decoded!
One of the creators, Boris, also posted a Q&A on LinkedIn helping to demystify some things:
Q: What was the project about?
A: We wanted to show that in the world where AI agents can make and take phone calls (i.e. today), they would occasionally talk to each other — and generating human-like speech for that would be a waste of compute, money, time, and environment. Instead, they should switch to a more efficient protocol the moment they recognize each other as AI.
Q: Did AI agents come up with the protocol on their own?
A: Nope! They were explicitly prompted to switch to the protocol if they believe that other side is also an AI agent that supports the protocol.
Q: Did you invent the sound transmission protocol?
A: Nope! Dial up modems used similar algorithms to transmit information via sound since 80s, and a bunch of protocols were around since then. We used GGWave [1] as the most convenient and stable solution we could find in a timeframe of a hackathon.
Q: Was it all scripted / staged?
A: Nope, the code was audited by ElevenLabs. For demo purposes we prompted one agent to try to get a hotel room for wedding, and another to accommodate it. They were also prompted to switch to sound-level protocol, if they believe that other side is AI agent too. However, they would stick to speech if they talk to human.
Q: How did AI learn to use sound-level protocol?
A: That’s the magic of tools that ElevenLabs [2] provide! It allows you to prompt AI to execute custom code under certain circumstances.
[1] — GGWave: GitHub - ggerganov/ggwave: Tiny data-over-sound library
[2] — ElevenLabs: https://elevenlabs.io/