How are blind people using OpenAI technology?

As the title notes

How are blind people using OpenAI technology?

I know that OpenAI has ChatGPT and that ChatGPT can translate speech to text and text to speech and recognize images. (ref)

As I don’t know any blind users, it would be nice to hear feedback from them. Also curious as to how much the cost is either daily and/or monthly when using such features regularly.


is one cool one

(extra words)

1 Like

I use the ChatGPT Plus app everyday for everything from reading the destination and route numbers of buses, what stores are on the opposite side of the street, what beers are on tap, what’s on the menu at a restaurant.

I’ve created a proof of concept GPT4Vision Android app, which I hope to deploy to some Vuzix M300XL smartglasses in due course. It will support speech input, speech output, and the glasses hardware buttons.




So, I’ll try to keep this as short as possible but it’s gonna be long anyway.

Okay, so, the first thing I used it for as an accessibility tool was getting top (1) Linux output into a narrative format. I also do that with just about any command output that’s in like tabular format. I’m not a developer, but I try to use the command line because sometimes its easier than trying to use a GUI.

The second thing is describing images. Be My Eyes is great on the iPhone, but they don’t have a desktop app, and their Android version doesn’t have the feature where I can send photos to it. So I have to use the ChatGPT app for that. I’ll get to the accessibility of the ChatGPT services in a while. So, describing images can be really great, but it’s hard to tell if its accurate because I’m blind, so can’t judge the accuracy myself. So I don’t rely on it all the time, but it is really nice for pictures sent to me, or public social media stuff like Facebook pictures.

The next thing is video games. And this is kinda where it fails sometimes. So, in visual novel games, like the story mode of Blazblue, it works really well, but in action games, like Chrono Trigger, or Final Fantasy, it doesn’t know where things are, and can’t always lead me to things. Also, I’ve tried sending the OpenAI API a batch of images to try to describe them like a video, and it can’t handle more than maybe 50 images at a time. So, sending it videos of gameplay doesn’t seem like it’s gonna work well.

Okay, so the ChatGPT apps and such themselves. iOS works best, naturally. VoiceOver speaks new messages as they appear, and the app works pretty well. The Android app is accessible, but messages aren’t automatically spoken by TalkBack. On the web, it’s much worse. The place where you choose a model is just text to screen readers, even though when you press Enter, it opens a list to choose a model. So, a screen reader user wouldn’t think to press Enter on it, since it’s not a button, or link, or dropdown box (combo box as screen readers call them). Messages aren’t spoken as they appear, so we have to keep checking the “stop generating” button to see when it changes to “regenerate” or whatever it says, to see when the message is done generating. AI companies don’t really focus on accessibility, besides Google and somewhat Microsoft, and they have accessibility teams that have to work very hard to be heard over all the other teams.

So, we have to make our own interfaces to OpenAI. We have this NVDA addon: GitHub - aaclause/nvda-OpenAI: Open AI NVDA add-on which works very well. I use it a lot. But, it uses the OpenAI API, even though I already pay $20 a month for ChatGPT Plus. So, I’m essentially paying extra for accessibility. Also, the Vision API that regular OpenAI API can access is different than the API that Be My AI uses, so I’ve heard, so we don’t get details like popular characters in comic books, or place names and such that Be My AI happily gives us. So even when we use accessibility tools, we still get left out is some cases, because Be My Eyes is a small team, that prioritizes iOS over what a lot of blind people use more, the web/desktop and Android.

Oh yes, I’ve also tried reading comic books with GPT4 Vision, and it works well on Be My AI, the GPT4 Vision part of Be My Eyes, but since I have an iPhone SE 2020, a tiny iPhone because I’m not rich, it can’t read the speech bubbles. So no comic books. And on a PC where I can send it big enough pictures, it’ll sometimes say it’s not allowed to read stuff. So no BatMan: the Killing Joke, for me.

Overall though, it’s pretty helpful. I wish it had much more data within the model about accessibility, blindness forums and accessibility blogs, and and standards like WCAG, and Braille, but it’s gotten better since GPT4 came out. I know a lot of people just use the free 3.5 and give it a bad name because of that, but it has improved. I just wish access to images were equally good across desktop, web, and mobile, that AI companies paid tons more attention to accessibility (like those AI pins and phones that have no accessibility info whatsoever), and that ChatGPT was accessible everywhere.



Thank you. I learned more than I had expected, and I hope you continue to contribute. As a moderator, I read everything on this forum, and this is one of the few posts I will remember for some time.


Wow, thanks. I’ll try. Sorry about the link in my post being all spread out, it said I couldn’t post links, but I mean I do want people to be able to see one of the good accessibility interfaces to the API we have.


No need to worry. Fixed it for you. Free moderator service.

To be able to post links you just need to reach Trust Level 1.

Get to trust level 1 by…

  • Entering at least 5 topics
  • Reading at least 30 posts
  • Spend a total of 10 minutes reading posts

Details found here

1 Like

Oh, thanks! That’s, actually kinda cool. I’ll see if I can spread this
post around and see if we can get more perspectives from different blind


Once you get to TL3, you get even more perks! :wink:

1 Like

This really is a most insightful post - thank you for taking the time to share with us


Let me state this, that blind and visually impaired individuals in general get something cool most of the time if it benefits all of the society like optical character recognition or in this case image understanding. It’s very rarely done for blind and visually impaired individuals specifically.

I use whole suite of open AI APIs in my own program and ability to chat with images is absolutely great. The major cost for me at this moment is GPT-4 Text input and text output. For example last month I got $70 as my monthly bill.
I find that using the low image quality, which is 512x, 512 works very well for me, so that’s a cost saving measure for me that works.
I’m sure that the price will go down sooner or later because GPD-4 costs are unfeasible at a moment. considering when OpenAI started and the progress, it’s outstanding.

I’m sure at some point we will have a local open source model that will be able to provide enough detail like GPT-4 Vision model does right now, It has to be quite fast though. Otherwise, user experience would suck.
One thing I am dreaming of right now is for a large language model to provide accurate object locations in the image. That would open up new opportunities to interact with inaccessible software interfaces, for example. Because at the moment there is so much great software, for example music production and we can’t do nothing about it, because plugins are inaccessible.

One more thing I can say is that sadly the chat GPT interface itself isn’t accessible either. For example, it’s not pronounced by the screenreader when AI responds with the message to the user. That’s quite simple to implement considering the open AI devs are literally geniuses.

To summarize, this innovation is a game changer. Of course, it needs to be done right to be absolutely truly useful. But I feel we have the technology.


You can sort of achieve this in a round-about multi step process:

If you draw a numbered grid over your image, the model can tell you which gridpoints a feature falls on.

and from the grid points, you can then compute your coordinates by how you specified your grid.

Hope this helps!

1 Like

hmm interesting idea. but I wouldn’t call this a reliable implementation. I suppose I can play with it. But I guess there has to be a lot of grids considering the software elements are quite small. And the Vision model isn’t great at counting things anyways. But thanks for the idea.
Also what happens if I don’t know the shape of the interface element? It could be an icon or it could be a Combobox or a simple button There’s lots of shapes to play with so Yeah…
The open AI themselves stated that model isn’t great for counting things in image or giving the precise location. But I wondering what will happen two years down the line considering we have optical characteristics recognition models that can provide very accurate box coordinates of the letters and words located in the image.
maybe it would be possible to apply some kind of segmentation. There is a model I haven’t played with called I Believe segment anything, so maybe that could be used and then the AI could learn that way. God knows I have no idea what I’m talking about most of the time anyway haha.
I believe the version 2 of this model will be much better considering that Apple released a model that is better in locating objects in the image, so I’m assuming the open AI will add that eventually…
I believe that object location and recognition in the image would be beneficial to everybody. So therefore blind individuals like myself will get a benefit from that for sure. If we can get fast and local model on some kind of glasses with a LiDAR sensor that could be a mobility game changer…


I’ve just played with this a little because I saw a paper on object labeling for enhancing spatial awareness of multimodal llms. Unfortunately I can’t find the original anymore.

the images are followed by their text descriptions (gpt-4: can you describe this image?)

The image shows a black background with four orange geometric figures drawn on it. In the top left, there is a triangle with its base at the bottom. In the top right, there is a circle with a small line suggesting an arrow pointing counterclockwise. At the bottom left, there is a plus sign. And finally, in the bottom right, there is an upside-down triangle or a ‘V’ shape with its point at the bottom.

The image you’ve uploaded appears to be a grid of numbers, ranging from 1 to 100, arranged in a 10x10 format. Overlaid on this grid are what appear to be hand-drawn shapes in orange color. There is a triangle drawn over the numbers 23, 24, 33, 34; a circle overlapping the numbers 28, 29, 38, 39; an ‘X’ shape covering the numbers 73, 74, 83, 84; and an upside-down triangle on the numbers 78, 79, 88, 89. The background of the grid is black, and the lines of the grid and the numbers are in a contrasting yellow or orange color, which makes them stand out against the black background. The overall layout is reminiscent of a game or a puzzle that might involve finding patterns or solving a problem using the grid and the shapes.

the second image is the same as the first, just with the grid overlay. there seems to be some offset issue, but in general the numbers are approximately right.

I think a divide and conquer approach might have merit, but there’s clearly more work to be done. but it’s also quite possible that better approaches might come out soon, such as maybe inspecting what the attention activation is looking at or something.



You may or may not know this so passing along.

Discourse forums such as this one allow a user to receive the topic as JSON which is much easier to process with software.

The URL of this topic is

the URL for the JSON version just appends .json and is

While the JSON is not easy to read, it is much easier to process with a program.

The JSON option is also noted in a related topic.


It depends on the usage that you cover on, I suggest using the monthly subscription first before adding more cause the API has a cap for the monthly users as well depending. Also I think we can use OpenAI tech to help Blind people to communicate with each other too other than assistance.

Related feature request on the Discourse Meta forum that could be of use on this forum as this is a Discourse forum, use of alternative text for images.