GPTs knowledge upload. Images for reference when creating new images

I have several images in png uploaded to GPTs knowledge that are exactly the type of image I want it to produce. In the instructions I have told it at least 20 different ways to reference the images I gave it for style, placement, background, and size. Regardless of how I write it, it still refuses to reference the images in knowledge even a little bit, even if that is its only instruction. Any ideas? Thanks.

1 Like

I read something about dall-e 3 rweriting all prompts to include more detail and this thread has a few good responses about this topic. Perhaps thats whats influencing the GPT to act up?

I’m having the same issue. I uploaded several images of the same character model to Knowledge and asked for it to produce an image of the same character model, and it generated something totally different. I asked for a detailed description of the character model, and was surprised to see that it generally matched the images, so it was able to access the files and interact with them. I have an instance of vanilla DALL-E that is able to mostly reproduce the same character consistently–remarkably well, actually, for a string of “like the last one, but…” requests–but sometimes deviates on details and needs to be reminded by uploading a seed image. I thought that the GPTs generator would allow me to establish some control over that, but the GPT Builder, Preview, and test threads have all resulted in images that have little or nothing to do with the seed images and parameters. I’m going to keep playing with the prompts and see if I can figure out what part of the sequence is broken.

Confirmed that the model can’t directly reference images uploaded to the Knowledge files. Perhaps a surplus of image files are used for custom training, but when asked to describe images in the Knowledge files, the model consistently hallucinates. It looks like the best way to upload the files directly into the chat and use them as reference. Perhaps there is a prompt that can direct the model to reference them as seed images and maintain the features of the character without repetitious prompting and frequent “reminder” uploads.


Just following up on this again since I’ve been playing with character creation. A few conclusions so far, in case it helps anyone:

  1. When you upload archetypical images, it seems to be able to refer to those images, but not systematically. It can recall any image in the chat session, but can’t differentiate between ones it created versus ones uploaded.
  2. The variations caused by the layers of interpretation can quickly dilute the identity of your archetype. Running a “redo” on a result until you get a match seems to help with consistency, so that bad batches don’t pollute the feedstock.
  3. The only way to maintain consistency is to have your archetypical images available to periodically upload and remind the chat that this is what the character should look like.

I’ve had decent luck with this prompt:

Create an image of [name], with all of the identifying features of the attached archetypical images that define [his/her] appearance, including hair and eye color and style, facial features, and body type (including height and build), as well as matching their [drawing style] drawing style.

Attire: [name] is wearing [attire].

Pose: [name] is [doing] at [place] looking at [direction].

Scene: The light is [source/quality]. The perspective is [framing details].

I’ve tried longer and shorter prompts. Shorter produces cleaner drawings, but GPT4/DALL-E may assume undesired details. More extensive prompts may produce more specific results, but also more clutter and aberrations. I generally don’t specify any detail I don’t care about. I can’t get it to produce exactly what I want in all points, but by being judicious about how I fill in the variables above, I’ve been able to produce a character in various scenes with a reasonable amount of consistency in their identity.

I might try to make a GPT that uses a summary of this prompt as its core function and see if I can streamline the amount of copy/paste work I do between prompts, but I’m not sure how much it will improve the process, versus adding another layer of processing (and variation) between the user input and image output.

1 Like