Assigning versions & context to LLM resource documents

I am working on a custom chatbot to answer questions about our companies products for our internal and contract content developers. The goal is to allow them to produce raw, but accurate content (e.g. blog posts, social media blurbs, ad copy) about our products and specific features within the products.

As I produce “blessed” source documents, I want to tell the LLM a few things about the contain each contains:

  1. Product Name (e.g. “Gorilla Editor”)
  2. Product Version/build Number (e.g. v12.0.1 b2456)
  3. Named Features discussed in the document. (e.g. GorillaSpell, GorillaColor)
  4. Common Names for Named Features (e.g. Spell Check, Color Correction)
  5. Document Author: Doug Daulton

The goal is the be able to write a prompt like this:

Write a blog post about GorillaSpell in Gorilla Editor v18.2. Explain what makes GorillaSpell unique in the marketplace. Write in the voice of Doug Daulton.

Assuming there are ample examples of what makes GorillaSpell unique in the market, the prompt should bring forward all of the key talking points in a rough draft within the context of the requested software version.

With that said, is it enough to include a table at the top of each document which includes this information? If it is consistent in format, will the model learn and apply that context?

Or, is there something else I need to do to teach the model specific things about the source documents in the custom library.

Happy to be pointed to videos or articles you think explain the desired solution. I’ve been digging but have not found a clear answer yet.

Thanks — Doug

1 Like

If you have a relatively small set of constant rules to follow, yes, perfectly acceptable to include those as part of the prompt and then I imagine you will be performing a vector database retrieval on your product documentation set to get appropriate context for the model to use in generating the ad copy, blog entry, etc. and then pass all of that along with the users original request for a complete and accurate response… sounds good to me, the 128K GPT-4-turbo model excels at these kinds of tasks.