OpenAI doesn't follow the instructions

I am trying to create an image of an altimetric graph of a mountain, giving as input the photo of the real mountain.
Ufortunately the openai creates artistic and surreal photos, which have nothing to do with what I am asking.

I tried to give more and more precise instructions, but all is in vain. It complete ignores or interprets very freely my instructions. See for example this script:

# -*- coding: utf-8 -*-
from openai import OpenAI
import requests

# Inizializza il client OpenAI
client = OpenAI(api_key='xxxxxxxxx')

# Descrizione dell'immagine che desideri generare
descrizione = "The altimetric profile of Monte Rosa seen from the front and which stands out on the horizon, stylized and in black and white."
"Monte Rosa must be in the center of the image, viewed from the front and silhouetted against the horizon, but it must only be a 30% portion of the total width of the image."
"The remaining 70% of the width of the image must be made up of other nearby mountains."
"The elevation profile must be exactly the same as the real one, based on an existing photo."
"There must be no free fantastic or artistic interpretation."


parametri = {
    "prompt": descrizione,
    "size": "1792x1024",
    "model": "dall-e-3",
#    "model": "dall-e-3.5-turbo"  # Utilizza il modello DALL-E 3.5 (gratuito)
#    "model": "curie"  # Utilizza il modello Curie (gratuito)
#     "model": "dall-e-3"
}

response = client.images.generate(**parametri)

# Ottieni l'URL dell'immagine generata
image_url = response.data[0].url

# Scarica e visualizza l'immagine
response = requests.get(image_url)
with open("monterosa_eng.jpg", "wb") as f:
    f.write(response.content)

What I am doing wrong?

That doesn’t make sense. You cannot supply DALL-E 3 with images.

Your script of course doesn’t have any images, but yet you talk about real mountains like the AI has some kind of map in a geographic information system. That is not the way it works.

“The altimetric profile of Monte Rosa seen from the front and which stands out on the horizon, stylized and in black and white.”
“Monte Rosa must be in the center of the image, viewed from the front and silhouetted against the horizon, but it must only be a 30% portion of the total width of the image.”
“The remaining 70% of the width of the image must be made up of other nearby mountains.”
“The elevation profile must be exactly the same as the real one, based on an existing photo.”
“There must be no free fantastic or artistic interpretation.”

What you actually want to receive as an image is elusive. Even a human painter would have a hard time discerning your desires from the English. Do you just want a photo of mountains? Do you want some cutaway section of the mountain that shows the 2D height?

Which model and which API should I use to modify an existing image?

This text just was one of the many tests that I did. Step by step I added more and more details , because the AI was completely misunderstanding me and ignoring the specifications. At the beginning the description was just: “The altimetric profile of Monte Rosa seen from the front and which stands out on the horizon, stylized and in black and white.”

Now I tried with this: “The mountain elevation profile of Monte Rosa in black and white and 2D.”. IMHO this definition is very precise; but the result is completely deceptive, it looks like AI is kidding me. If you have a more precise definition of altimetric profile or elevation profile in 2D, you are welcome.

See an example of what I mean:

Here’s a trail map of one trail hike of Monte Rosa. A trek that winds all over the mountain.


This kind of “hard data” is not what DALL-E is for. It is an artist, not a cartographer.

If you had a GIS system, you could get an elevation map seen overhead, and choose your own line angle and placement with which to bisect the elevation to make a 2D side profile map. It still would likely have little meaning as you simply change the orientation from North-South to East-West, and you have a different look at the mountain.

Language AI can assist in this task, but you would use computer programming and real data to make the plot. Not a picture imagination machine that is driven by keywords and a library of neural imagery.

There’s Dalle-2, but I think for your use-case you might be best off using photoshop tbh.

1 Like