Hi,
I try to use the Chat Completion API in different scenarios, but all my cases always rely on the function-calling feature. One of the scenarios is a chatbot for creating and editing images. Let’s reduce the number of function declarations to two for clarity:
functions = [
{
"name": "prompt_to_image",
"description": """The tool uses generative AI models to create images from text prompts""",
"parameters": {
"type": "object",
"properties": {
"n": {
"type": "number",
"description": "The number of images to generate",
},
"size": {
"type": "string",
"description": "The size of the generated images",
"enum": ["1024x1024", "1792x1024", "1024x1792"],
},
"prompt": {
"type": "string",
"description": "Required text prompt for image generation",
},
"quality": {
"type": "string",
"enum": ["standard", "hd"],
"description": "The quality of the generated image, where 'hd' indicates higher detail and consistency.",
},
"style": {
"type": "string",
"enum": ["vivid", "natural"],
"description": "The style of the generated images, influencing the realism and dramatic effect.",
},
},
"required": ["prompt"],
"additionalProperties": False,
}
},
{
"name": "resize_image",
"description": """Resize image tool. It is useful when it's necessary to resize the image to the given size.""",
"parameters": {
"type": "object",
"properties": {
"url": {
"type": "string",
"description": "Required non-empty source image URL",
},
"width": {
"type": "number",
"description": "The width of the target image",
},
"height": {
"type": "number",
"description": "The height of the target image",
},
},
"required": ["url", "width", "height"],
}
}
]
Now, I want to explain to the LLM how to handle user prompts using few-shot inference.
Also, I want to pass history to maintain the conversation context.
The main question is, how can I do that? I’ve seen different and controversial approaches even on this forum.
One important thing here is that I don’t have a summarization phase and don’t need the LLM to complete the text after the actual function call. I just have results that look like an image URL and some metadata about the image (id, width, height, size, etc.)
APPROACH 1
messages.append({
"role": "system",
"content": """You are a helpful assistant who helps to create and edit images.
Select the most suitable function based on the user's request.
Don't make assumptions about what values to plug into functions"""
})
messages.append({
"role": "user",
"content": "Generate a picture of a cat"
})
messages.append({
"role": "function",
"name": "prompt_to_image",
"content": "{\"prompt\": \"A picture of a cat\"}"
})
messages.append({
"role": "assistant",
"content": """Here is the image I created based on your request
ID: IMG1
URL: https://example.com/1.png
Width: 1024
Height: 1024
"""
})
APPROACH 2
messages.append({
"role": "system",
"content": """You are a helpful assistant who helps to create and edit images.
Select the most suitable function based on the user's request.
Don't make assumptions about what values to plug into functions"""
})
messages.append({
"role": "user",
"content": "Generate a picture of a cat"
})
messages.append({
"role": "assistant",
"content": None,
"function_call": {
"name": "prompt_to_image",
"arguments": "{\"prompt\": \"A picture of a cat\"}",
},
})
messages.append({
"role": "function",
"name": "prompt_to_image",
"content": """Here is the image I created based on your request
ID: IMG1
URL: https://example.com/1.png
Width: 1024
Height: 1024
"""
})
The main question. Which of the two approaches is correct? Or are neither of them correct, and something third is needed? Thank you!