Possibly combining ChatGPT and DALL-E to make a powerful art teaching program (maybe)

Ok. I’m not going to pretend I know the first thing about coding AI and the implications of making AI analyze taken pictures or drawings. Something I do believe though is that just as AI (DALL-E, for example) can make great art for us, if integrated with an AI Database (such as Chat-GPT), could potentially teach us how to make great art ourselves!

It would be a stretch to try and make AI interpret a hand-drawn picture and tell if it’s good or not (just the “meaning” of a picture is already impossible to put together by an AI, let alone give anyone advice on how to improve on that front). However, when it comes to art techniques and theory, it could potentially analyze and track someone’s progress in mastering a particular art skill, helping them get better at drawing and creating art.

I’d like to reiterate that you should take this with a massive grain of salt. For reference, I consider making my computer print out “Hello, World!” my greatest achievement in coding

Here’s ChatGPT’s answer when I proposed this concept (in this case, photo input is allowed for the program and it’s focusing on the non-subjective aspects of art skills):

If the focus is primarily on evaluating practice works in different art techniques, then the challenges related to subjectivity might indeed be less prominent. Here’s how the combined DALL·E and ChatGPT system could effectively evaluate practice works in various art techniques:

  1. Technical Assessment: The system can analyze the technical aspects of the artwork, such as the accuracy of perspective, proportions in human anatomy, and the use of shadows and lighting. It can provide specific feedback on how well the learner has executed these technical elements.
  2. Comparative Analysis: The system can compare the learner’s work to reference images or criteria, highlighting areas where the learner has successfully applied the technique and areas where improvement is needed.
  3. Progress Tracking: By maintaining a record of the learner’s practice works over time, the system can demonstrate the learner’s progress in mastering different techniques. This tracking can serve as motivation and a source of encouragement.
  4. Customized Challenges: Based on the learner’s skill level and the techniques they wish to improve, the system can generate customized challenges or exercises that target specific skills. These exercises can be designed to gradually increase in complexity as the learner improves.
  5. Visual Feedback: While ChatGPT can provide text-based feedback, DALL·E can augment this by generating visual representations to illustrate specific points or improvements that can be made. Visual feedback can be particularly helpful in art instruction.
  6. Guidance and Resources: ChatGPT can offer additional guidance and recommend resources, such as tutorials, videos, or articles, to help the learner further develop their skills in specific techniques.

With a focus on technical aspects, the combined AI system could be a valuable tool for learners looking to improve their art skills in areas like perspective, human anatomy, and shading. It could provide detailed, objective feedback and support to help artists refine their abilities in these specific areas. However, as with any AI-based evaluation system, it’s essential to maintain a balance with human feedback and guidance to ensure a comprehensive and well-rounded art education.

It might be harder to identify objects when analyzing hand-made work, but I think it could definitely work when the art is drawn on a digital device (so it would be at least of some use to those who own drawing tablets or touch-screen devices).

In the end, I just think this could work someday. We would all benefit from it, I think (I personally suck at drawing). It’d really be cool if art school students in universities (or even high schools) could have a stronger support in their education like this. If the program could reliably function with hand-made drawings as an input, anyone could learn at least a substantial amount of art skills and knowledge! People from less fortunate backgrounds interested in learning art could gather and only need to use one digital device (and have internet connection) to learn at least the bases of making good art (if the program can support various profiles with one account, which I think should be possible). Imagine an art teacher effectively teaching 50-60 students at once using AI to help both the teacher and the students. AI has already proven capable of boosting student knowledge and skills in other subjects, why not expand AI’s educational capabilities further?

Then again, people are already making AI do all the work for them (which is pretty smart, but not that much fun, in my opinion) and this idea is really just that, an idea. I thought I’d just like to share it, I seriously doubt the OpenAI team haven’t been suggested something like this before. I’d love to learn more about if this could possibly become a reality in the near future.

Dall-e is an image generator that makes AI pictures based on a description. It is entirely possible to integrate it into a chatbot product, so that instead of you providing the description, the AI provides the description.

Computer vision, however, is not available by API or released to end users. You can see it being used, however, in Bing AI Chat, where you can upload a picture, and have it described.

With one, a description goes in. With the other, a description comes out. The opportunities to combine them for creativity are low.

1 Like

Hi! Thank you for replying. You mentioned Bing AI Chat can have a picture described. Do you know how accurate or, well, “good” Bing AI is at doing that? If it’s at least decently precise, then the possibility of this idea being developed wouldn’t be so tiny, right?

It tries hard!

Here is a result with a prompt to drive it into decisiveness:

The painting you’ve shared is quite intriguing. It’s a portrait of a person’s head and shoulders, with a blue rectangle covering the face. The painting is done in blue tones and is signed “Frida U”. The person has long, flowing hair and is wearing a collared shirt. The background is a lighter blue than the person’s hair and shirt.

Here are some points for evaluation:

  1. Color Palette: The student has effectively used different shades of blue to create contrast between the figure and the background. This shows a good understanding of color theory.
  2. Technique: The brush strokes seem confident and deliberate, suggesting that the student has a good grasp of painting techniques.
  3. Composition: The placement of the figure within the frame is well-balanced, drawing the viewer’s attention to the subject.
  4. Conceptual Understanding: The use of a blue rectangle to obscure the face is an interesting choice. It adds an element of mystery and abstraction to the piece.

Areas for improvement could include:

  1. Detailing: While the minimalist style works well for this piece, the student could experiment with adding more detail to their work in future assignments.
  2. Experimentation with Mediums: If they haven’t already, the student could try working with different mediums to broaden their skill set.
  3. Exploration of Themes: While the obscured face (ED: faces are blurred before submission) is an interesting concept, encouraging the student to explore different themes could lead to more varied and thought-provoking work.

Remember, art is subjective and these are just suggestions based on one interpretation of the work. The most important thing is that the student continues to express themselves through their art.


Alright. It’s… it’s not too bad. It’s a start; a bit far off from being able to help with precision, but it’s getting closer. I wonder if OpenAI has something like this in mind with DALL-E (maybe DALL-E 2?).

Thank you again for taking the time to reply.

Not really “what they had in mind”, I wouldn’t think. Dall-e can accept various styles and situations and subjects described to it; generally what has been seen in photography and art can be replicated, sometimes with unreal aspects. It takes prompt engineering and trials to get what you envisioned to be approximated.

So, for now, the inverse of this process can really only be as good as that: an approximation. Did I understand that correctly?

It’s not really an “inverse”, they are standalone products using different technology.

GPT-4 was trained to be multi-modal with computer vision, so it has an understanding of constructing images in the same way that it constructs language, by their likelihood of appearing in a certain way based on inputs, with machine learning refining the internal weights. Parts of imagery training may be relying on tagged data, so when you have 1000 “dog noses” it can synthesize how other images of noses would activate the same neuron and semantic meaning. A very advanced version of OCR text recognition.

Dall-e is more input = output, and is more uncertain. It will make abstraction and uncanny valley and impossible visuals as well as the occasional impressive piece from text. It is not the leading edge of image generation, others like Midjourney have comparable qualities.

Let’s test the “inverse”. The AI initial description of the student art. Then I pick maybe the worst of the four (although another looks like a corpse)

I feared that using the word “inverse” would cause confusion; I meant the process I described earlier (the taking photo input and evaluating some aspect of it). I should have been more specific about it, that was my bad.

But I’m interested in what you’re saying here. I don’t understand a single thing after the first sentence :sweat_smile:. Could you tell me what all of that’s supposed to mean? (pretty please)

The truth is we really don’t know how they operate, it’s like asking how the human mind knows the smell of cinnamon toast.

Here’s a dive deep into the neurons of a computer vision model, where you can see the different layers that match imagery to meaning. They mean something to an AI that has billions of interconnections but only at the simplest early layers do they mean something to us.

to get an idea what the neurons mean, a ton of synthesized data is fed into them to see what kind of imagery most activates a neuron. Truly art pieces on their own.

Cathedral ceilings or such?
tubey and wormy?
nightmare inspiring, with sharpness in the center

1 Like

Got to focus on doing what llms are good at, they are great at fiddling with prompts and adding details

Our Discourse bot Artist persona integrates sdxl with gpt-4

Analysis is going to be hard, but a tool that teaches people how to prompt is a practical thing you could do


Hedgehogs don’t have to be joyful…