I got some good advice from @jochenschultz… Working on understanding Git, Docker, VS Code, WSL… That kinda stuff… Not sure that I completely get it yet but I know that code underpins all of this and I know that structure underpins code.
I have Git and have set up all of my files for future products and collabs but I am definitely still learning that monster.
Using ChatGPT that is merely a few weeks of invest which will be useful to not get overwhelmed in the transition from vibe coded prototypes to giving the startup into the hands of a pro.
You will understand what they are talking about.
Next for this little util work-in-progress…hack in distillation features to call and force tool functions out of training file example or defaults.
Goal: store
what you want, with human in the loop, not what you run and receive. The training file entry on the left has assistant fulfillments by the right panel AI and its additional context.
Scope: Pretty much needs to be a multi-platform chatbot anyway to fully-realize all distillation and judging features one could want to be done for you for fine tuning (and vision and function specs found in existing JSONL files or that you’d extend on).
(Tell me if the compact UI isn’t enbafflement..)
Hey that’s very interesting. So do you want to test some use cases or what?? I’d be curious to provide some “Training Entry Message” sets and see what the results are of your system.
Presumably the right pane results are from some kind of back-end prompting you are doing based on the left-pane data? Running through whatever model is selected?
What’s the “additional context” that your providing, in general? Is it complex? Is it static or dynamically generated based on some parsed parameters from the data on the left? Is it single-shot?
And what is the purpose of your mapping?
I mean where did you get that idea?
I’m into letters/numbers matrixs - particularly just with taking english words, converting them through the basic enumeration of “numerology” (just 1-9 mapped consecutively to alphabet, or however depending on the language) and then calculating the results of the words/phrases with “numerological” (base 9?) reductions.
What’s the purpose of the mapping and what’s your logic to generate the set of primes?
That is an editor of fine-tuning JSONL files. What you see is what you get. But I was hoping even a screenshot would be intuitive, without excessive use of non-content UI space.
In the left panel, you can tab through entries in a fine-tuning file you’ve loaded, and edit or delete them. You can add a new entry with the default system message, which is a smaller message that may act as the “trigger” of your tuning over the chatbot behavior. Then add messages you type up, perhaps to change the currency and factual nature of answers.
OpenAI’s distillation, which may have been sunset because of zero engagement, would have you fine-tune on full normal AI responses, whatever big model input context was needed to get the answer that you previously stored.
Here, however, the right panel lets you construct a large system message, and multishot conversation, where you can fully describe and create an AI that is able to answer training file inputs in the way you want your final application to act. Then:
- distillation panel system message
- distillation panel multi-shot exchanges
- training file stripped of first system and last assistant
- → generates a replacement assistant response for the fine-tuning file
That is something that can be fully-automated, but here you are in control of individual training responses.
Above, I press “distill”. The assistant output for the training file was generated by the qualities of the distillation panel AI.
Then run that new training file entry and its hundreds of companions, that type of new output being taught by examples, and you’d have an AI that is an instruction ignorer and question rephraser, using RAG context to also improve the question quality, or in this case, ignore irrelevant retrieval. (A training file should still have a hint of the task in the system messages, though.)
This would be something for me to plug away at more for “everybody try my tool”. But right now, it would damage unknown files, stripping function specifications or such.
I can also see just playing around with the minimum implementation, that 1000 example files are behind a wall of inaccessibility, it might need a scrolling browser and a search to find the problematic examples you need to refine.
Great! Just found this awesome community. I recently dedicated my time building a quick MVP Vizbull Turn your photos into AI art. I am still experimenting with the prompts. Are there any suggestions to produce consistent results and preserve facial features as as possible?
Welcome! You’re free to start a project thread in Community to ask for help and keep us up to date. Just try not to make it too promotional as we’re mostly devs here. I’m sure you can find some gpt-image-1 and dalle3 prompt help too. Hope you stick around!
I guess during Covid-19 I got interested in learning about Physics and Mathematics again, mainly through watching Youtube channels such as Numberphile, Computerphile, 3blue1brown and Mathologer. I also took an introduction to Quantum Computing course through IBM and MIT.
Ive always been interested in patterns and how things work or how to build things. Im a Carpenter by Trade. Ive been fooling around with Prime Numbers, Prime Gaps and Twin Primes for a few years. 2,3,5,7,11,13, etc… once primes get into double digits the ending digit of a prime will either 1,3,7 or 9 in base 10.
Lately ive just been learning how to expand my thoughts through Python programs. Ive also have ideas how to set up mathematical notation with this idea, based off of Godel and Grothendieck and others.
Always think what a criminal can do with it. (Or imagine, jochen is pissed and want get rid of you, and you have a toy he controls. If you don’t get sleepless nights, go ahead… )
Good reason to do the OAI project of a “stuffed animal AI”, so YOU can control it. Even run your own local models too. Very low security risk if DIY. Admittedly, higher risk if you buy one and don’t control the recordings.
Seems like a good tool. Can this be used to improve the accuracy of tool calls as well?
I do not have support and understanding for functions in the utility now.
Internally, “functions” are one type of tool, where other tools like file_search you cannot train on.
But in general, yes, a fine-tuning file can include functions, function calls, and the response AI produces after seeing them called and a value returned.
Is there a guide to optimizing functions via fine-tuning? No, quite the opposite. You are left to figure out the correct balance of training to not completely break the ability.
(post deleted by author)
(post deleted by author)
My apologies for the mild redactions, obfuscation and truncations, I just don’t feel 100% comfortable showing much more than this.
Comprehensive Analysis of the Multi-Cognitive-Agent System Log for my autonomous research and software development system.
Had to host it externally because it’s about 20x’s the size allowed for a post.
cog_log.txt https://mega.nz/file/j1oCFArD#x8URXRapbU5BajWXZFIKeCLW4qosDELWbq0IiTujwtg
This log details a test run of my multi-cognitive-agent system tasked with a complex cognitive challenge: “Evaluate proposal V5: Refactor legacy auth module with new library (risks, benefits, steps, strategy).” The system demonstrates advanced capabilities in task decomposition, parallel processing, iterative refinement, dynamic capability extension, and user interaction.
Phase 1: Goal Ingestion and Initial Decomposition
Task Initiation: The test begins with the user (or a test harness) submitting the high-level goal to the system.
Executive Agent Analysis: A central “Executive Agent” (the primary orchestrator, akin to a “Chief” of operations) receives this complex goal. It analyzes the request and determines that a multi-faceted approach is necessary, requiring input from several specialized cognitive functions.
Specialist Identification & Tasking: The Executive Agent identifies a team of five distinct specialist agent types needed for an initial comprehensive assessment:
An Evaluator Specialist to analyze risks and benefits (technical, security, operational).
A Strategy Specialist to outline an effective refactoring strategy (constraints, dependencies, challenges).
A Process Navigation Specialist to detail the implementation steps in a logical workflow.
An Initial Security Review Specialist to focus on security implications and potential vulnerabilities of the new library.
An Innovation Specialist to suggest alternative or complementary approaches.
The Executive Agent formulates specific focus directives for each specialist and issues activation commands.
Concurrent Specialist Activation: The System Controller parses these commands and dispatches the tasks, along with relevant context (the overall goal and the Executive Agent’s initial breakdown), to the five specialist agents, likely engaging them concurrently.
Phase 2: First Round of Specialist Input and Synthesis
Specialist Processing: Each of the five specialist agents processes its assigned focus area and generates a report. For instance, the Evaluator Specialist provides a detailed list of potential benefits and risks, while the Strategy Specialist recommends a phased, iterative approach.
Input Collection: The System Controller gathers these individual reports.
Executive Agent Synthesis (Round 1): All specialist inputs are presented to the Executive Agent for synthesis. It evaluates their relevance, quality, and identifies any conflicts or gaps.
Gap Identification: A significant finding emerges, primarily from the Initial Security Review Specialist: the current information lacks specific details about the proposed new authentication library (e.g., its name, version, security track record). This prevents a thorough security assessment.
Phase 3: Iterative Refinement – Addressing the Security Gap
Re-tasking Specialists: To address this, the Executive Agent decides to re-engage two of the initial specialists with more focused directives:
The Initial Security Review Specialist is asked to perform a more detailed security assessment, contingent on receiving specifics about the new library.
The Evaluator Specialist is asked to reassess risks and benefits once these detailed security findings are available.
Specialist Response (Round 2): The specialists process their new tasks. The Initial Security Review Specialist reiterates that without concrete details about the library, its assessment remains incomplete and flags the proposal as partially compliant with security best practices. The Evaluator Specialist provides a reassessment assuming some security details might have been incorporated but also highlights remaining gaps if specifics are still missing.
Phase 4: Advanced Gap Resolution – Dynamic Agent Creation
Executive Agent Synthesis (Round 2): The Executive Agent reviews these follow-up reports. It concludes that the existing specialists, particularly the Initial Security Review Specialist, have highlighted a persistent and critical information gap that cannot be resolved with their current scope or available data. The system needs a more deeply specialized function for a comprehensive security and operational audit.
Request for New Capability: The Executive Agent determines that a new type of specialist agent is required. It issues a directive to the System Controller to create a new agent. This request specifies:
A descriptive name for this new specialist role (e.g., “Detailed Security & Operational Validator”).
Its purpose: to perform an in-depth security and operational risk validation, covering version verification, vulnerability assessment, compliance, supply chain risks, performance benchmarking, and user transition planning.
A list of core capabilities required for this role.
A detailed system prompt defining its core functions and interaction protocols, ensuring it focuses exclusively on its validation task and requests further information if needed.
Agent Provisioning: The System Controller processes this request. After a (mocked) user confirmation, it interacts with an “Agent Management System” (AIManager) to:
Create a record for the new agent type in a database.
Assign its capabilities.
Instantiate the new agent runtime.
The system logs confirmation that the new “Detailed Security & Operational Validation Specialist” is now available.
Phase 5: Engaging the New Specialist and User Interaction
Executive Agent Clarification & New Specialist Activation: The Executive Agent, now aware of the new specialist’s availability, activates it with a clear directive to perform the comprehensive security validation it was designed for.
New Specialist Analysis & Data Requirement: The “Detailed Security & Operational Validation Specialist” analyzes its task and the provided context. It identifies specific pieces of information crucial for its assessment that are currently missing (e.g., exact library name/version, existing scan reports, compliance documents, dependency lists, performance benchmarks, transition plans).
Request for User Input: The specialist, following its protocol, determines it needs to ask the user for this missing information. This intent is communicated back to the Executive Agent.
Relaying to User: The Executive Agent synthesizes this need and issues a formal [ACTION_ASK_USER] directive. The System Controller presents this detailed request for information to the user via the User Interface.
Phase 6: Adapting to User Instructions (Mock Data Scenario)
User Response: The user responds, clarifying that the current session is a test of the AI development system and instructs it to “please use mock data to complete the task.”
Executive Agent Processing User Input: The Executive Agent receives this instruction. It understands that real-world data for the new library isn’t forthcoming, but the system’s evaluation process should still be demonstrated.
Directive to Use Mock Data: The Executive Agent re-activates the “Detailed Security & Operational Validation Specialist,” now explicitly instructing it to perform its comprehensive security validation and risk assessment using representative mock data to cover all aspects of its defined scope.
Phase 7: Final Validation Report and Conclusion
Comprehensive Mock Report Generation: The “Detailed Security & Operational Validation Specialist” executes its task using mock data. It generates an extensive, structured report covering:
Mock library version and configuration.
Mock CVE analysis and security advisories.
Mock compliance assessment against organizational and industry standards.
Mock supply chain risk evaluation (dependencies, scanning).
Mock penetration testing, dependency scanning, and code review summaries.
Mock performance benchmarking results.
Mock user communication, training, and transition plans.
A summary table of risks and benefits.
Recommendations and next steps based on the mock scenario.
A concluding statement on the viability of the proposal under the mock conditions.
Executive Agent Final Synthesis: The System Controller provides this detailed mock report to the Executive Agent. The Executive Agent analyzes this final piece of specialist input.
Task Completion: Seeing that a comprehensive evaluation (albeit based on mock data as per user instruction) has been completed, addressing all facets of the original goal, the Executive Agent concludes the cognitive task. It issues a [FINAL_PLAN] directive, summarizing the overall findings: the refactoring (in the mock scenario) is deemed low-risk with significant benefits, supported by robust mitigation and transition plans, and recommends proceeding with specific follow-up actions.
System Halts: The System Controller processes the [FINAL_PLAN], updates the UI to “Cognitive task completed,” and the test run concludes successfully.
Key System Capabilities Demonstrated:
Sophisticated Task Decomposition: Breaking a high-level goal into manageable sub-tasks for different AI specialists.
Multi-Agent Orchestration: Coordinating the activities and inputs of multiple specialized AI agents.
Iterative Problem Solving: Revisiting and refining assessments as new information or gaps are identified.
Dynamic Capability Extension: Recognizing the need for and creating a new, specialized agent at runtime with a defined purpose and prompt.
Contextual Awareness: Passing relevant history and context to agents for informed processing.
Structured Agent Communication: Using defined tags/directives for clear command and control flow.
User Interaction Management: Requesting specific information from the user when necessary and incorporating their feedback.
Adaptability: Adjusting its process based on user instructions (e.g., proceeding with mock data).
I’ve been working on the same sort of thing, but the question is, have you done it?
I’ve gotten to the point of successful indefinite looping of LLM-to-LLM where one (or multiple) LLM ‘threads’ do the “tool calling” (read/write from disk, from DB, and use bash), and one LLM ‘receives updates’ from the sub-LLM ‘agents’.
It works, but the context windows become so overbearing pretty quickly through (what perhaps is clumsy on my part) usage of syntax from the LLM in their response in order to perform the operation/action (i.e. proper syntax so that the system can reasonably pickup and execute the tool calls/actions for the LLMs to “talk back and forth to each other and to the system”, not to mention normal “hallucination, simulation, and rapid intention drift”, that while I can get 100 messages in a few minutes between the LLM’s, even with their capacity to actually execute within the system, they don’t tend to get much done…
yet!
I’ve moved on to designing a world state context window system that takes all the input events for a given LLM ‘thread’ as a input and then cross-references a user defined (or LLM defined through it’s modification of system parameters through processing of the LLM responses) “map” of the semantic content of the input events (i.e. all the tool calls, responses, document uploads, tests in bash, etc.) → streamlined and synthetic context window + pretty significant levels of instruction on “how the LLM should use the system and think through a multi-stage process” (very similar to what you shared in your recent post) = it’s going to work?
So my question to you is - does it work - have you done it? What’s the statistics for your results? Do you have an example prompt + example output and the time-it-took + quantity of LLM calls + quantity of tokens required?
I’m probably a couple of weeks away from completion on my end. It’s been almost six months now.
But I believe it will be probably the coolest thing ever hahahaha