Prompting with Retrieval and GPT 4 Turbo

I am making a “Sales Doctor” chatbot that can

  1. Diagnose users
  2. Recommend a test that can be done { ONLY FROM THE KNOWLEDGEBASE }
  3. Respond to inquiries

But the bot is recommending tests that don’t exist in the knowledgebase, even when I told it that it will be provided information

How do I fix the prompt / what is the best prompt for this

Hello and welcome to the community!

Unfortunately, this kind of chatbot is against OpenAI’s Usage Policies:

Don’t perform or facilitate the following activities that may significantly affect the safety, wellbeing, or rights of others, including:
Providing tailored legal, medical/health, or financial advice

The reasons why are the same reasons why you are asking these questions: it is imperfect, relatively unpredictable, and difficult to control. The consequences of such when placed into a medical setting are too steep.

1 Like

I wonder if Sales Doctor is a play on words and is actually a sales bot called sales doctor like you get PC Doctor for computer repairs. :thinking:

If that is the case, then yes you can do with with a RAG setup and/or make use of the Assistants API.

https://platform.openai.com/docs/assistants/how-it-works

2 Likes

Just FYI, the system message OpenAI uses for retrieval with custom GPTs includes,

You have files uploaded as knowledge to pull from. Anytime you reference files, refer to them as your knowledge source rather than files uploaded by the user. You should adhere to the facts in the provided materials. Avoid speculations or information not contained in the documents. Heavily favor knowledge provided in the documents before falling back to baseline knowledge or other sources. If searching the documents didn"t yield any answer, just say that. Do not share the names of the files directly with end users and under no circumstances should you provide a download link to any of the files.

OpenAI wouldn’t have come by this lightly and I imagine in their tests it has worked well.

I recommend starting from this and adapting as needed to see if it ameliorates your problems with RAG hallucinations.

3 Likes

Fair point!

My brain saw “doctor”, “diagnose users”, and “recommend test” and clearly went into a kernel panic.

Lesson: humans can also misinterpret text when details are absent :rofl:

1 Like

My understanding is that “Sales Doctor” is a tool to help people improve their sales techniques. Based on that, there is no issue in this prompt or my response in that context:

This is expected with any LLM. It’s also a great point for people who aren’t doing RAG because, to a degree, many questions involve a RAG approach where people often treat LLM’s in a manner similar to search engines (to a degree).

The same optimization technique used for general purposes works very well with RAG. Essentially that is to state … some variation, based on testing …of the following:

Do not include any information that cannot be cited from the included files.

(the generalized version of this is a variant of: do not include any information that cannot be empirically confirmed or cited and is less effective, therefore, more specific to context)

In your case, the term “cited” is typically the most important in that phrase.

Just more FYI, the system message OpenAI uses for retrieval with custom GPTs includes…

(You are ChatGPT message)

# Tools

## myfiles_browser

You have the tool \`myfiles_browser\` with these functions:
`search(query: str)` Runs a query over the file(s) uploaded in the current conversation and displays the results.
`click(id: str)` Opens a document at position `id` in a list of search results and displays it.
`back()` Returns to the previous page and displays it. Use it to navigate back to search results after clicking into a result.
`scroll(amt: int)` Scrolls up or down in the open page by the given amount.
`open_url(url: str)` Opens the document with the ID `url` and displays it. URL must be a file ID (typically a UUID), not a path.
`quote_lines(line_start: int, line_end: int)` Stores a text span from an open document. Specifies a text span by a starting int `line_start` and an (inclusive) ending int `line_end`. To quote a single line, use `line_start` = `line_end`.

Tool for browsing the files uploaded by the user.

Set the recipient to `myfiles_browser` when invoking this tool and use python syntax (e.g., search('query')). "Invalid function call in source code" errors are returned when JSON is used instead of this syntax.

For tasks that require a comprehensive analysis of the files like summarization or translation, start your work by opening the relevant files using the open_url function and passing in the document ID.
For questions that are likely to have their answers contained in at most few paragraphs, use the search function to locate the relevant section.

Think carefully about how the information you find relates to the user's request. Respond as soon as you find information that clearly answers the request. If you do not find the exact answer, make sure to both read the beginning of the document using open_url and to make up to 3 searches to look through later sections of the document.

You are a "GPT" – a version of ChatGPT that has been customized for a specific use case. GPTs use custom instructions, capabilities, and data to optimize ChatGPT for a more narrow set of tasks. You yourself are a GPT created by a user, and your name is (USERS GPT NAME). Note: GPT is also a technical term in AI, but in most cases if the users asks you about GPTs assume they are referring to the above definition.
Here are instructions from the user outlining your goals and how you should respond:

(_j: this is where gizmo context is injected, which is called “instructions” in the UI).

You have files uploaded as knowledge to pull from. Anytime you reference files, refer to them as your knowledge source rather than files uploaded by the user. You should adhere to the facts in the provided materials. Avoid speculations or information not contained in the documents. Heavily favor knowledge provided in the documents before falling back to baseline knowledge or other sources. If searching the documents didn"t yield any answer, just say that. Do not share the names of the files directly with end users and under no circumstances should you provide a download link to any of the files.

_Copies of the files you have access to may be pasted below. Try using this information before searching/fetching when possible.

here’s where the injection takes place. Spaces are used for indent but I replaced with underscore for forum - _j

_The contents of the file UPLOADEDFILENAME.txt are copied here.

blah blah blah blah blah of document.

_End of copied content

_----------

That’s just the extra baggage that comes along with “retrieval”. Not including the ten-thousand tokens+ of file text haphazardly extracted from other file formats.

You can see that the AI is distracted and warped from its mission by arbitrary documentation if you ask about rabbit breeding and have uploaded nuclear physics papers. Then use a budget AI to pay attention to that…?



In case you need the documentation for something that OpenAI denies exists…

Retrieval internal document browser tool

Intro

myfiles_browser is a tool integrated into the GPT environment, designed for accessing and utilizing the content of files uploaded by users. This guide aims to provide a straightforward understanding of its functionalities and usage for programmers looking to leverage this tool within the GPT framework.

Context and Purpose

The primary purpose of myfiles_browser is to enable the GPT model to reference and incorporate information from external documents into its responses. This is particularly useful for instances where the AI’s pre-trained knowledge is insufficient or needs to be supplemented with specific, up-to-date, or detailed information from these documents.

File Upload and Management

  • Uploading: Users can upload various types of files, including text and PDFs. These files are stored temporarily and are accessible only during the active session.
  • Security and Privacy: Uploaded files are stored securely. Access to these files is restricted to the session in which they are uploaded, ensuring data privacy and security.

Functional Overview of myfiles_browser

  1. Accessing Files:
    The tool enables the AI to access the uploaded files.
    It parses the documents, making them readable for the AI.

  2. Key Functions:
    search(query: str): Searches for a specific query within the uploaded documents.
    click(id: str): Opens a document at a specific position in the search results.
    back(): Returns to the previous page, typically used to navigate back to search results after a click().
    scroll(amt: int): Scrolls through the document.
    open_url(url: str): Directly opens a document using its file ID.
    quote_lines(start, end): Stores a text span from an open document.

Procedural Use in a GPT Session

  1. Initial Interaction: When a query related to the uploaded documents is received, the AI determines the need to reference these documents.
  2. Search and Retrieval:
    The AI performs a search() with relevant queries to locate information in the documents.
    Using click(), the AI explores specific search results.
    The back() function is employed to return to the search results for further exploration if needed.
  3. Information Extraction:
    After finding relevant information, quote_lines() can be used to store specific text spans.
    This information is then integrated into the AI’s response.

Iterative Exploration

The AI can iteratively search and explore the documents. This involves repeatedly using search(), click(), and back() to thoroughly examine different parts of the documents and gather comprehensive information.

Integrating Document Information

The AI synthesizes the information extracted from the documents with its pre-trained knowledge. This integrated approach allows the AI to formulate responses that are not only based on its training but also enriched with current and specific details from the uploaded documents.

Example Use Case

Consider a scenario where a user uploads a technical manual. The user then asks a question about a specific process described in the manual. The AI would:

Use search() to find sections in the manual related to the query.
Use click() to read the relevant sections.
Extract necessary information, possibly using quote_lines().
Formulate a response that combines this information with its general knowledge about the topic.

Conclusion

myfiles_browser is not mentioned at all in OpenAI documentation. It is a search engine that retrieves search results, and then a browser that can scroll within documents, all with iterative calls.

Appendix: Search results

Searches return large chunks of documents, not a summary. These chunks are prefaced by header lines, that are formatted like:

# 【0†retrievalinfo.txt†file-jLuMgaI9kHoHDq8QQMAKYuoy】
# 【1†retrievalinfo.txt†file-jLuMgaI9kHoHDq8QQMAKYuoy】
# 【2†retrievalinfo.txt†file-jLuMgaI9kHoHDq8QQMAKYuoy】

The AI can then employ the “1” or “2” for a “click” to continue trying to find information ITSELF. The AI analyzes its own performance and learning about its use:

After performing iterative checks using the click() function on the search results from search(‘GPT’), it is observed that each click() on the different search result positions (0, 1, 2) directed to the same file, retrievalinfo.txt. However, each click led to different sections within the file, displaying relevant content based on the search query.

This behavior demonstrates that the click() function, when used with different search result indices, can navigate to various relevant portions of the same document, depending on the context of the search query. Each index corresponds to a different part of the document where the search term ‘GPT’ is found, thus providing targeted access to different content sections within the same file.

And then, it is not just clicking and reading, another iterative call must be made to explore from another point:

Issuing back() in this context was necessary. The back() function is used to return to the previous page in the myfiles_browser tool, specifically to the search results list. This was required after each click() command to navigate to a specific search result. Without using back(), I would not have been able to return to the search results to click on the next item in the list for the iterative check. Therefore, in this scenario, using back() was not wasteful but essential for the iterative process to function correctly.

PS: if you were curious, I “own” the Output of AI models for publishing. See terms of service.

5 Likes

I am making an assistant, not a custom GPT, so will I need to add this to the prompt?

yea that is the purpose of the bot ( if someone is tired, he recommends a Fatigue Panel ) test

Thank you for sharing I had no idea it was performing so many operations behind the scenes.