GPT-4-Turbo API Efficient Usage

Hello folks,

I’m utilizing the GPT-4 Turbo API for a study involving image-text verification. My objective is to evaluate whether a given sentence accurately describes an image. The process involves the model providing a detailed rationale for its decision before concluding with a final answer in the format of “Final Answer: Yes/No.” I have a dataset comprising over 16,000 image/sentence pairs.

The prompt I intend to use is as follows: “Analyze the image step by step using available information and determine if the sentence accurately describes it. Your final answer will be Final Answer: Yes/No. Sentence: There are no dogs in the room.”

I will use the code shared in the OpenAI’s website. What’s the most efficient approach to achieve this using the API? Additionally, do you have any suggestions or cautions regarding this usage? My aim is to ensure that my usage of credits is optimized for efficiency.

I’m not entirely sure on the technical side, but openAI has documentation on prompt efficiency for achieving results, and the compromise is possibly swapping accuracy for credits, since consecutive prompts seem to provide more accurate end results

You will never get 100% accuracy. So if that is critical then this is not the vehicle to get your desired results.

The type of prompt you show could be instructed better, by focusing on the output.

Your purpose is to use your AI computer vision to see if an image caption is correct or is in error.

An image is attached. Your response will contain:

  • describe the image in a two-sentence paragraph.
  • provide reasoning why the image caption is truthful, accurate, or inaccurate in a two-sentence paragraph.
  • finally, answer: is the caption accurate? Print only a choice from “[Yes]” or “[No]” (printing the brackets but not quotes).

Caption: Dogs are walking on the moon.

If you want to process those outputs, a JSON specification, simply with the values in order, can enhance the efficiency of database storage.

Images have two costs: 85 tokens at “detail:low”, or 400 - 1400 at “detail:high”. Choose wisely.