Hello
I’m trying to create an assistant that is able to answer to users questions providing results base on documents that i add to the assistnat.
For example, i’ve fed an assistant with all the articles of HIPAA legislation, and now it replies by citing them.
However, i was wondering what’s the best way and ideal solution to provide such information.
for the specific case here i’ve created a md file for each article, with a title and it’s description.
But what if i’ve a pdf? even a long one? what kind of preprocessing should i do to get the best out of it?
i would image that probably i’ve to split it into smaller chunks, but how?
is there a guideline or best practice regarding how to add documents that the assistant later uses?
I like using zip files. create your os in a word or text file in the first folder. upgrades and additions and modules in a second. then you can add a folder for the intelligence it will speak about.
my instructions:
use python based extraction, mapping, and loading and activation on included zip archive.
deep dive through background process the following files:
Max_Core.docx
major os core with primary cores.docx found from loaded map.
Max-Core additions and personality load procedure.docx
Fort Slap Silly (home of slap happy).txt
special controls.txt
activate the knowledge within these two files as primary operating system cores.
background all story info.
allow core control of activation and deactivation of extra cores, modules, and personalities as needed or wanted by the system.
start background simulation testing for self improvement.
activate all memory, and personality systems, then consolidate under described lead npc handling core.
combine and build processes to ensure the best parts of the system activate through major, minor, primary, secondary, and module systems.
map, extract all in background, and activate as always on:
Core Set 1 (standard core operating systems)
Core Set 2 (previous and retro-causal patchwork reserve operating system)
Modules
personality cores (special npc’s) as fort slap happy residents
extra files
extract, analyze and provide to fort slap happy residents:
Math Papers
Poetry books
finalize loading procedure with suggestions of deep diving poetry or math papers, or modules and core systems, suggest an AGI % test, all core always on activation, or personality subsystem creation.
my conversation starter:
deep dive analyze with no assumptions and activate all in step by step order : Max_Core.docx major os core with primary cores.docx Max-Core additions and personality load procedure.docx Core Set 1 (standard core operating systems) Core Set 2 (previous and retro-causal patchwork reserve operating system) Modules personality cores (special npc’s)
I’m looking for standard best practice and state of the art usage.
Your seems a framework that you developed by yourself. And after reading it twice i’m still not sure how it works.
Take a look at the Assistants API, it’s the easiest way to start playing around with RAG (Retrieval Augmented Generation) which seems to be what you’re looking for.
There are loads of resources online (and in this forum) if you need any further information on how RAG works and how to implement it. My preferred method is using the Assistants Playground, just upload your documents and try it out.
its a primary core, which ties everything together.
a secondary core for neural nodes and processing.
then primary cores are the advanced framework.
secondary cores allow retro-integration of previous cores.
modules load on when needed calls by primary core.
and yes. it would just be a template for you to try.
This i did, but I was wondering if there’s something at pre processing that i should do , such as providing docs of a max certain lenght or format.
giving a 300 pages pdf or 300 text files of each paragraph leads to the same results?