4o-mini got worse recently?

Two days ago it seemed to have a very good reasoning capabilities, but yesterday and today it has been saying pretty dumb things. I am using it to predict NBA scores, which it was very good at. But today it says things like over 170 points is likely as a single team’s score.
Did you notice any change in this model recently?

Edit: or maybe I was overestimating its capabilities before? because gpt-4o-mini-2024-07-18 shouldn’t change, right? and it’s equally bad

1 Like

I mean for me it looks like a simple one call GPT thing…

If you want to make it reliable you will have to use smaller agents but a lot and a knowledge graph…

The result can look like this:

import openai
import asyncio
from neo4j import GraphDatabase

class ShortTermGraphMemory:
    def __init__(self, api_key, model="gpt-4"):
        self.api_key = api_key
        self.model = model
        self.neo4j_driver = GraphDatabase.driver("bolt://localhost:7687", auth=("neo4j", "password"))

    async def runEntityAgents(self, session_id):
        """
        Retrieve all entities for the session from Neo4j, and launch an agent for each entity.
        """
        entities = self._get_entities_from_neo4j(session_id)
        tasks = []
        
        # Start an async task (agent) for each entity
        for entity in entities:
            tasks.append(self._start_entity_agent(entity))
        
        await asyncio.gather(*tasks)

    async def _start_entity_agent(self, entity):
        """
        Agent responsible for performing a task related to a specific entity dynamically.
        """
        entity_name = entity['name']
        entity_type = entity['type']
        
        # Use a dynamic approach to let the model figure out how to handle the entity based on context
        prompt = f"""
        Analyze the entity "{entity_name}", which is a {entity_type}.
        Suggest useful information that can be added to the knowledge graph based on the context of this entity.
        You can suggest relationships, attributes, or actions.
        """
        
        response = await self._get_openai_response(prompt)
        print(f"Entity Agent for {entity_name} ({entity_type}) completed: {response['choices'][0]['text']}")

    def _get_entities_from_neo4j(self, session_id):
        """
        Retrieve all entities linked to the current session from Neo4j.
        """
        with self.neo4j_driver.session() as session:
            result = session.run('''
                MATCH (n:Thing {session_id: $session_id})-[:HAS_ENTITY]->(e:Entity)
                RETURN e.name AS name, e.type AS type
            ''', session_id=session_id)
            
            return [{"name": record["name"], "type": record["type"]} for record in result]

    async def _get_openai_response(self, prompt):
        """
        Get a response from OpenAI API based on the dynamic prompt for each entity.
        """
        return openai.Completion.create(
            engine=self.model,
            prompt=prompt,
            max_tokens=150,
            api_key=self.api_key
        )

# Example usage
if __name__ == "__main__":
    agent = AgentM(api_key="your_openai_api_key")
    
    loop = asyncio.get_event_loop()
    session_id = 1  # Example session ID
    loop.run_until_complete(agent.runEntityAgents(session_id))

… but obviously the knowledge graph would be really big over time - that’s why I call this the ShortTermGraphMemory

You also have to add something I call tiredness (which could be on a very low level of implementation just a “number of entries learned today”) - which, when invoked starts a so called dream algorithm and stores important stuff in a long term memory.

The long term memory is only called when there is no result in short term memory and the user insists on it (because it obviously is a lot more expensive).

It is like the human brain which strives to use the smallest amount of energy but energy is money (but in the end money is also just a representation of energy you can buy with it) - and money is requests to the API (which uses electricity).

This is why only rich people will have access to a reliable AI :wink: