This is still a bit of a work in progress but thought I’d share a new prompt building library I’m working on called Promptrix.
Promptrix is a a prompt layout engine for Large Language Models. It approaches laying out a fixed length prompt the same way a UI engine would approach laying out a fixed width set of columns for a UI. Replace token with character and the same exact concepts and algorithms apply. Promptrix breaks a prompt into sections and each section can be given a token budget that’s either a fixed set of tokens, or proportional to the overall remaining tokens.
All prompt sections are potentially asynchronous and rendered in parallel. Fixed length sections are rendered first and then proportional sections are rendered second so they can proportionally divide up the remaining token budget. Sections can also be marked as optional and will be automatically dropped should the token budget get constrained.
Promptrix also supports generating prompts for both Text Completion and Chat Completion style API’s. It will automatically convert from one style prompt to the other while maintaining accurate token counting.
I have a number of examples on the readme for my github repo but here’s what standard prompt might look like:
const prompt = new Prompt([
new SystemMessage(`The following is a conversation with an AI assistant. The assistant is helpful, creative, clever, and very friendly.`, 500),
new ConversationHistory('history', 1.0),
new UserMessage(`{{$input}}`, 100)
]);
This prompt has SystemMessage section, ConversationHistory section, and a UserMessage section. The SystemMessage has a fixed token budget of 500 tokens and the UserMessage has a fixed token budget of 100 tokens. These sections will be rendered first and any remaining token budget will be given to the ConversationHistory because it’s a stretch section with a span of 100%.
Here’s another example that uses a custom PineconeMemory section to add semantic memories to the prompt:
const prompt = new Prompt([
new PineconeMemory(<pinecone settings>, 0.8),
new ConversationHistory('history', 0.2),
new SystemMessage(`Answer the users question only if you can find it in the memory above.`, 100),
new UserMessage(`{{$input}}`, 100)
]);
Again, the fixed sections will get rendered first but this time the PineconeMemory and ConversationHistory sections will share the remaining token budget with an 80/20 split…
I have a bunch of basics implemented and just need to bang out a few unit tests. Hope to have something published this weekend. I also think that my friend who ported Vectra to python is planning to work up a python port of Promptrix so hopefully that will land soon as well.