Setting a Tone and Style in GPT4-Vision responses

Hello, I’m new to GPT vision so I just want to make sure I am understanding correctly. I want to make my responses from GPT4 Vision return in a certain tone and style, based of some example text that I can provide.

According to the docs, there is no ‘Fine tuning’ available for this model.

Is my only option to simply add examples into my User prompt? I.e ‘here are some examples, copy this tone and style’. Or is there a better way to do this?

Thanks in advance

1 Like

Yes, for the time being the best way is to either describe the desired style in the prompt and/or to provide an example.

1 Like

I use GPT 4 vision with my Chatbot. The System prompt influences the way the Chatbot speaks and its style.

Thanks @merefield . Out of interest do you put examples in your System prompt over User prompt? I’ve been trying to prove out which would be best for this.

I’ve found when using GPT to review the tone and style based of some example text, it perhaps is a little over engineered and doesn’t quite give me the results I’m looking for when inserting this tone and style into a prompt. Do you have any recommended approach of getting the best out off getting GPT to mirror a tone/style of writing? Many thanks

1 Like

No examples, just a description of how I want the Chatbot to behave, eg a description of its character… “you are a … you often do this … you answer in the style of …” etc.

1 Like

I should make it clear here that the vision retrieve is separate and uses a very simple system prompt. The answer is then fed in private to the Chatbot who can then rephrase in its style (governed by its system prompt.)

1 Like

Just to add to the example from @merefield. I use GPT-4-Vision exclusively for picture description at the moment and I also need it to respond in a certain style. In my case, I rely solely on the user message. My prompt is as follows (partially redacted):

“text”: “You are provided with a picture from […]. Your task is to describe the content of the picture. In doing so, you apply the following principles: (1) Use objective, neutral and expert-level language; (2) Be specific yet succinct in your description; (3) Identify and describe relationships between graphical items, if present; (4) Return any text or labels verbatim; (5) Do not describe colors; (6) […]”

3 Likes

pretty much like this where vision_text is your prompt for what to do with the image.

def upload_to_gpt_vision(file_path, vision_text):
with open(file_path, ‘rb’) as img_file:
image_b64 = base64.b64encode(img_file.read()).decode(‘utf-8’)

headers = {
    "Content-Type": "application/json",
    "Authorization": f"Bearer {openai_api_key}"
}

payload = {
    "model": "gpt-4-vision-preview",
    "messages": [
        {
            "role": "user",
            "content": [
                {"type": "text", "text": vision_text},
                {"type": "image_url", "image_url": f"data:image/png;base64,{image_b64}"}
            ],
        }
    ],
    "max_tokens": 300
}

logging.info("Uploading image to GPT-4 Vision API...")
response = requests.post("https://api.openai.com/v1/chat/completions", headers=headers, json=payload)
return response.json()
2 Likes

Thank you for all your feedback, it’s really helped me much appreciated.

Max

5 Likes

Are you using the API or a custom GPT?

The forum category is API, and the poster’s comments about using system prompts are consistent with that.

I have experimented with examples for writing in a system prompt (though not specifically with Vision). I would be careful if this is a commercial application, as examples can sometimes lead to unintended consequences, unless your goal is very strict like “I always want this exact syntax/tone/word choice/etc”.

Using gpt-4 and gpt-4-1106-preview, I do set an example of writing. Then, I run the return through a multiple refining functions with gpt-4. It works fantastically well but is more expensive and takes quite long, so depending on your needs, this may or may not work. But imo it leads to the highest quality writing and is near impossible to do with a single return.

It was intended as example to get them started :grin: