Hi, new to all this so trying to get my head round it - I think I get the idea but one thing I don’t understand is how the prompt reads the metadata to see each piece as part of a whole?
Ask the model to output markdown.
That way you don’t have a “hard” time with the regex.
I include the metadata in my context documents.
My metadata is stored in Weavate with the vectors. So, when the cosine similarity computation is done and I’ve got my context vectors, I construct each document to include: content, title, url, etc… Whatever I believe the LLM needs to bring back the best answer.
I also optionally include a summary of the source document which can be included in each chunk. Using this and the title helps the LLM see individual chunk as part of the whole.
How do you mark that within your context to upload? I upload plain text which is converted to a vector rather than a json.
This is one way:
Using this code to construct the returned vectors into context docs:
// Iterate over the results array to extract the relevant elements
foreach ($results as $index => $result) {
// Extract the 'content', 'date', 'groups', and 'taxonomy' elements for each row
$contextDocument = $result['content'];
$documentTitle = isset($result['title']) ? $result['title'] : '';
$documentSummary = isset($result['summary']) ? $result['summary'] : '';
$documentDate = isset($result['date']) ? $result['date'] : '';
$documentGroups = isset($result['groups']) ? implode(', ', $result['groups']) : '';
$documentTaxonomy = isset($result['taxonomy']) ? implode(', ', $result['taxonomy']) : '';
$documentURL = isset($result['url']) ? $result['url'] : '';
$documentQuestions = isset($result['questions']) ? $result['questions'] : '';
// Construct the context document string with labeled elements
$documentString = "Document Title: '{$documentTitle}'\n";
$documentString .= "Content: {$contextDocument}\n";
if ($this->includeSummary === true ) {
$documentString .= "Source document summary: {$documentSummary}\n";
}
$documentString .= "Event Date: {$documentDate}\n";
$documentString .= "Document Groups: {$documentGroups}\n";
$documentString .= "Document Taxonomy/Tags: {$documentTaxonomy}\n";
$documentString .= "URL: {$documentURL}\n";
if ($this->includeQuestions === true) {
$documentString .= "Questions that this document answers: {$documentQuestions}\n";
}
# Debug
# $this->newOutput .= "Document Title: {$documentTitle}. " . "<br>";
// Append the context document string to the prompt content
$promptContent .= "Context document " . ($index + 1) . ": {$documentString}\n";
$promptContent .= "-----\n"; // Delimiter to separate context documents
}
Then,
// Build the prompt containing question and context documents
$prompt = $this->solrai_createPromptContent($question, $context);
# Debug
# print_r($context) . "<br>";
# print_r($prompt) . "<br>";
// Initialize the $messages array with the system message
$messages = array(
array("role" => "system", "content" => $systemMessage)
);
// Define the new user message (question + context docs)
$newUserMessage = array("role" => "user", "content" => $prompt);
// Append the new user message to the end of the $messages array
$messages[] = $newUserMessage;
// Get the chat completion with history from the LLM
$result = $this->solrai_getChatCompletion($messages, $apiOpenAI);
This is php. If you are using python or something else, just ask ChatGPT to translate.
Another thing you may want to consider, depending upon your use case and your users, is to give the end-user options to make choices at the prompt:
I am using WordPress so php would be perfect, I haven’t built the prompt though as such - I’m using a plugin called AI Engine so not sure if this would work alongside - thank you though, really helpful.
Hey Newbs, it doesn’t sound like you are confused to me. I get the impression from reading this thread that there are some unspoken and differing assumptions on how the vectors are created and compared. Based on my understanding of how the vectors are created and compared however, I agree entirely with your line of questioning (and reading between those lines, your understanding of how the vectors work).
This seems a very thoughtful process, thanks for sharing it.
Do you have any reference code where you have tried this approach?
Thanks