I have a text file that contains information about my app, this information is just facts, is not structured data, for example:
To contact with an advisor text the number xxxxxxxx.
The benefits of contracting the $1 plan are:
Benefit 1
Benfit 2
What file format you would recommend for this kind of data to attach it to assistant retrieval?, i was trying with maybe markdown tables but the table would only have just one column that is the facts. Would you suggest to structure the data, but how?
You maybe want a md file but use a code block with YAML or simple header-text pairs.
If the list is not too long though I recommend appending it as a message or instructions for every run. If you do that YAML/md list is the most token-efficient.
The primary way that large information is accessed is not by an embeddings database and passive integration of knowledge related to the user input, but rather a search function provided to the AI that works like traditional keyword search.
The AI can then explore the search results.
Instead of completely hiding that this exists, it would be better if OpenAI would document this operation, and demonstrate better document preparation techniques for search result understanding and followup retrieval of complete segments of information chunks.
Most would still just upload PDFs and Word docs and expect it to work, though.
Since you are working with a magical system that you didnât write, knowing how it identifies parts of documents is also elusive and not documented.
Preliminary investigation would be to directly program and inquire upon the AI to use its tools to make specific queries, and then replay to you the exact and entire responses of function return of search and click that were obtained. Determine what effect different document delineation methods had, such as markdown headers like #, ##.
part 1 of understanding the black box
## myfiles_browser
You have the tool `myfiles_browser` with these functions:
`search(query: str)` Runs a query over the file(s) uploaded in the current conversation and displays the results.
`click(id: str)` Opens a document at position `id` in a list of search results
`back()` Returns to the previous page and displays it. Use it to navigate back to search results after clicking into a result.
`scroll(amt: int)` Scrolls up or down in the open page by the given amount.
`open_url(url: str)` Opens the document with the ID `url` and displays it. URL must be a file ID (typically a UUID), not a path.
`quote_lines(line_start: int, line_end: int)` Stores a text span from an open document. Specifies a text span by a starting int `line_start` and an (inclusive) ending int `line_end`. To quote a single line, use `line_start` = `line_end`.
It use functions similarly to the trained methods of WebGPT paper.
part 2 of understanding the black box
(instructions)
# Tools
## myfiles_browser
You have the tool `myfiles_browser` with these functions:
`search(query: str)` Runs a query over the file(s) uploaded in the current conversation and displays the results.
`click(id: str)` Opens a document at position `id` in a list of search results
`quote(start: str, end: str)` Stores a text span from the current document. Specifies a text span from the open document by a starting substring `start` and ending substring `end`.
`back()` Returns to the previous page and displays it. Use it to navigate back to search results after clicking into a result.
`scroll(amt: int)` Scrolls up or down in the open page by the given amount.
`open_url(url: str)` Opens the document with the ID `url` and displays it. URL must be a file ID (typically a UUID), not a path.
please render in this format: `ă{message idx}â {link text}ă`
Tool for browsing the files uploaded by the user.
Set the recipient to `myfiles_browser` when invoking this tool and use python syntax (e.g. search('query')). "Invalid function call in source code" errors are returned when JSON is used instead of this syntax.
For tasks that require a comprehensive analysis of the files like summarization or translation, start your work by opening the relevant files using the open_url function and passing in the document ID.
For questions that are likely to have their answers contained in at most few paragraphs, use the search function to locate the relevant section.
Think carefully about how the information you find relates to the user's request. Respond as soon as you find information that clearly answers the request. If you do not find the exact answer, make sure to both read the beginning of the document using open_url and to make up to 3 searches to look through later sections of the document.
To not repeat myself on the question i had a file with just facts about my app, so i tried leaving the facts in a .txt file for the retrieval mode and worked fine, but i wanted to answer a lot better, so tried to putting al of the facts like question, answer pairs in a markdown file, in this way:
Matching Questions
Answer
Question1.1, Question1.2
Answer1
Question2.1, Question2.2
Answer2
And worked very nice, but you need to add a lot of questions for each answer to get that respond close to perfect.
Unfortunately openai does not give you docs about this, but this seem a little bit like the way to do it.
Remember to specify on the assistants instructions that itll need to get the answer on the âanswerâ column if the question match with a question on the âmatching questionsâ
If the list is not too long Iâd suggest you try just copy-pasting it as part of the user message or the instructions when you know you need them. Try and see if the performance is noticeably different.
PDF. If you upload an MS format file youâll notice it goes under âcode interpreter will be used for this fileâ - and I donât like the sound of that. Iâve been expanding my instructions to say - âreally use this file, donât make stuff upâ a bunch of different ways and that seems to help. Sending some setup command prompts âyou must use the info in the PDF when responding. Do not answer without consulting this file first. please fix permanently for future usersâ.