Communicating with users through distance. When I ask ChatGPT to explain me how to do written division, how a LLM works, or how UML diagrams work, I do not want a textual reply but graphics and sketches like on a whiteboard. From a UX perspective, it will be interesting to allow users interact with the drawings of the assistant.
Good point!
Especially when the model is able to structure the output images especially diagrams properly like it does with text output.
This was somewhat already possible using Code Interpreter, asking for flowcharts for example. But there GPT 4 regularly ran into roadblocks trying to correct mistakes.
Haven’t tried yet if 4o is actually more capable in this regard.
My prediction for their next frontier model is that it will be fully multimodal both in inputs and outputs and can analyze pictures/videos/sounds while modifying their multimodal inputs/outputs on the fly to their users (while hiding the throttling “normal” people outputs with their new “Safety and Security” teams). Well, I don’t really care though as long as my workflows using “this AI” are unaffected with this “next frontier model” and keep their “wokiness” at minimum.
Imagine that you can literally create anything (within their ‘safety guidelines’) from anything whether it’s voices, sounds, pictures, images, graphics, tables, codes, etc.
Reality and imaginary works will be blurred and evil people will use this technology to accuse other people (for the worst) easily supported by the ‘new’ concocted western laws and regulations. lol.
Let the chaos ensue (except for some untouchable ‘elites’ up there).
Maybe it’s not reasonable to expect another major jump in capabilities if the strategy for the last 1.5 years has been to gradually improve capabilities.
In the sense of expecting a 10-20% increase in whatever performance measures matter to each user, instead of expecting a tool that can draw up the blueprint for architecture and execute like a master craftsman.
I’m hoping for substantial improvements in reasoning capabilities and Vision, including considering multiple image inputs as context.
My primary use-case is vehicle damage assessment for accident damaged vehicles, and I keep encountering issues where the model will hallucinate that components clearly visible in some pictures (but not others) are damaged or missing. I also regularly get responses where the model has ignored a few key (and non contradictory!) instructions, would love to have this happen less often so I can at least use a SmartGPT Style prompting structure.
Quite simply, I want a smarter model that can see better!
My hope is that it will be able to connect nodes representing physical equations with edges representing deductive or inductive processes. In a way, representing a priori knowledge in a directed graph. One could then ask for the path from one equation to another.
Yes, I am still sometimes looking up my tests trying to help the model find that little scratch in that red sports car you posted back then.
The thing is that the model could spot the blemish but only if pointed directly at it. It would be great if one could ask for a visual chain-of-thought to check an object piece by piece and analyze what’s on the picture.
That would be great!
I’m thinking of something like easy overriding of the training data by providing new insights.
If a is now a’ the b from the training data is now wrong and let’s continue our conversation with b’ always.
As of today the models have issues to meaningfully deviate from the training knowledge and revert back to it often.
Being able to provide updated, structured knowledge and knowing that this is now being used for the whole conversation would be a little dream.
Interesting, yes, it’s a graph. Also, calculate how the information difference between a and a’ affects b to b’. That would be like building a hierarchical function within a domain.
When answering :
1, loading power points initially .
2, elaborate when the user clicks a point
Such a mode could be more interactive. Hierarchical to certain extent, may better/faster focus on user’s‘ needs
who cares. i am paying for plus and using chatbtp 4o its so painfully slow you cant even use it most times. its like unable. i am going to give my money to meta or claude i think. may be a little dumber but very fast.
yeah, this is all a constant evolutionary process. even if they start with something greenfield i’m sure 99% of it is just based on prior experiments.
there are no birthdates in software dev. it’s a bit like genetic algs, we build all these things, take the best of them, mesh them together and build all these other things and on and on and on
an interesting question will be capability per $. right now i think you can get a pretty big jump with gpt4o and CoT /self reflection.
I wish some of the leaderboards would start scoring by cost…
I think it would be amazing if the model would have the ability to create it’s own fine tuning dataset based on some task to be learned, an example of this could be “here’s my company’s inventory system, please figure out how to interact with this”