I’m using a GPT4, vision-preview deployment in my Azure OpenAI Platform in WEST-US-2 region
I am having difficulties in identifying issues w.r.t Azure OpenAI API and Azure OpenAI Platform
Consider following scenarios,
- From a PDF, using pdf2image, I am converting PDF into PIL Images and then converting the same into b64 as follows:
def image_to_b64(image):
"""
Converts a PIL image to base64 encoded string.
Args:
- image (PIL.Image): The image to convert.
Returns:
- str: The base64 encoded string of the image.
"""
buffered = io.BytesIO()
if image.mode != 'RGB':
image = image.convert('RGB')
image.save(buffered, format="JPEG")
img_str = base64.b64encode(buffered.getvalue()).decode()
return img_str
with open(pdf_file_path, "rb") as pdf_file:
pdf_bytes = pdf_file.read()
images = convert_from_bytes(pdf_bytes) # default DPI=200
image_base64 = image_to_b64(image)
payload_image_content = [
{
"type": "image_url",
"image_url": {"url": f"data:image/jpeg;base64,{image_base64}"},
}
]
and I observed that when I am using this, I am seeing lot of OCR inaccuracies in my response with gpt4v model
- When I manually took a screenshot on my linux system, and passed that image as an input on Azure OpenAI Platform’s Chat Playground along with same deployment, parameters, system prompt, etc , there were no inaccuracies in the output.
So I am wondering if Azure OpenAI Platform is doing any pre-processing which we are unaware of while passing image as an input on the platform
If you’ve any suggestions, please let me know