Has anyone successfully set up a real-time AI feedback system using screen sharing or livestreams?

Hi everyone,

I’ve been trying to set up a real-time AI feedback system — something where I can stream my screen (e.g., using OBS Studio + YouTube Live) and have an AI like ChatGPT give me immediate input based on what it sees. This isn’t just for one app — I want to use it across different software like Blender, Premiere, Word, etc., to get step-by-step support while I’m actively working.

I started by uploading screenshots of what I was doing, but that quickly became exhausting. The back-and-forth process of capturing, uploading, waiting, and repeating just made it inefficient. So I moved to livestreaming my screen and sharing the YouTube Live link with ChatGPT. At first, it claimed it could see my stream, but when I asked it to describe what was on screen, it started hallucinating things — mentioning interface elements that weren’t there, and making up content entirely. I even tested this by typing unique phrases into a Word document and asking what it saw — and it still responded with inaccurate and unrelated details.

This wasn’t a latency issue. It wasn’t just behind — it was fundamentally not interpreting the stream correctly. I also tried sharing recorded video clips of my screen instead of livestreams, but the results were just as inconsistent and unhelpful.

Eventually, ChatGPT told me that only some sessions have the ability to access and analyze video streams, and that I’d have to keep opening new chats and hoping for the right permissions. That’s completely unacceptable — especially for a paying user — and there’s no way to manually enable or request the features I need.

So now I’m reaching out to ask: has anyone actually succeeded in building a working real-time feedback loop with an AI based on live screen content? Whether you used the OpenAI API, a local setup with Whisper or ffmpeg, or some other creative pipeline — I’d love to know how you pulled it off. This kind of setup could be revolutionary for productivity and learning, but I’ve hit a brick wall.

Any advice or examples would be hugely appreciated.

yea this is easy to be honest.


first build a terminal watch mod

as shown here

change youurs from error watching like i have to text/blob watching

then build a second helper like this

from there you have to build a entire memory system, which would look like this

and a absorber like this

even after you get to this part, to have it respond takes 2 additional systems , reflector, and shell, - with also intent and learning, assuming you are using logic - once youve done all that, you need a controller

as u can see her, my development helper observers all code i type in VS code, records it and can even react to it, also has skill tracking. and yes it links to VS code and is able to “See” whats there in live time

you can see the chain links agent 1-7 i have them turned off because they arent needed

there ya go.

bit of advice - what you are doing is outdated, and extremely tedious to do, but as you can surmise i havent just done it - you can have it self code through GPT with LTM and logic, its just tedious - a better strat is to simply provide an ai with logs, unless you have about 7000$ laying around or use quantization but that would defeate the purpose of leveraging chatgpt which i highly recommend. GPT provides you a v20 engine… build a chasis AROUND them. dont spend time trying to become the car.

also while the version shows in the screenshots is instant, i can gaurentee you wont have instant. Not because its not possible, but because you wont have the framework for it. You could build the framework, but ( and im really not trying to flex by saying this) outside of you leveraging a entire self contained memory system like i do - your model will ALWAYS react to data as a new prompt within the GPT safeguards. I cant/wont tell you how to build that within their framework, but again you can see not only is it possible but by their own doctrine, preffered.

heres the proof of what you will get absent your own memory system

because again, GPT has to follow safety rails - thus if it were watching a chat and someone state something that violates it - this is all your going to get

which is why what you are doing is outdated.

again… chat gpt/ openai.. provide us all with a rocket engine - you have no need to build something new, just revisit what already exist, within their own ecosystem. they provide the power, you just need to build the instructions.

2 Likes

Thanks for the detailed response — I appreciate you taking the time to lay all that out. Just to clarify, my original goal wasn’t to build anything as advanced as what you described. I was simply looking for a more streamlined and practical way to get feedback from ChatGPT while working, without having to take constant screenshots and re-explain everything over and over. I’m definitely not trying to create a whole multi-agent pipeline or full memory system — just hoping for something that could simulate a real-time feedback loop in a more efficient way.

That said, is there a simpler version of what you described that might achieve similar results? Something closer to the idea of streaming or auto-capturing frames and passing them through a lightweight loop? I’d be interested if there’s a minimal setup you’d recommend based on what I originally outlined.

the thing is brother this is the simply way - what i just explained auto feeds context back into it -

however what you could do is a lesser form of context retain me, here is how u do that

go to convo on YT or whereever , hit CNTRL A - then CNTRL C then open notepad and hit CNTRL V - save it, and reupload that document, tell your model to extend character count to over 200,000 it be default only gonna scan the first 1000-5000 and then infer.

tho what i walked you through is not complex, a user with 0 exp can assemble in in about 1 hour.

but you asked how to get rid of hallucinations, and you asked about having a system live watch/audit what it sees, you wont acheive that without an audit system, thats why all LLMS suffer from it, it would be expensive to audit 2382239 user files, better to do what i suggested but I understand if it appears more complex than it is, but you already set up something MORE complex, aka obs.