How to parse all the content of an image using gpt-4 model, since the maxm context length is only 4095 tokens

I have a menu image, and i need to extract the menu items information in structured format. But the problem is, the image, i am using is large. But maximum output token size which is 4096 is not enough to extract all the menu information.
What could be the possible way to deal with it ??

You will likely want to use some image slicing with overlap, and send requests that don’t require such a large output.

You can try either sending multiple image inputs attached to a user message, or simply separate requests. The input context length of vision AI models is large - just the output is restricted by OpenAI.

The vision quality also cannot keep up with that requirement of large information extraction per image. If you are being charged 1200 prompt tokens for an image, how can you expect the image to be transformed into 4000 tokens?

Are you actually approaching that token limit anyway? I had to look hard for a single page that pushed the limits (and we even get identification):


This is just 1k tokens of that:

Here’s a detailed replication of the menu items listed on the vintage “The Cheesecake Factory of Beverly Hills” menu:

SPECIALTIES

  • QUICHE OF THE DAY $3.95
    Served with Tossed Green Salad and Fresh Fruit
  • ROASTED CHICKEN $3.25
    Served Hot or Cold with Your Choice of Any Two Salads
  • THE FACTORY BURGER $2.10
    Charbroiled, with Jack or Cheddar Cheese, Lettuce and Grilled Onions
    Served on Sourdough French Loaf, White or Wheat Roll
  • SUPER FACTORY BURGER $2.40
    Our Factory Burger with Freshly Sauteed Mushrooms
    KING SIZE on SPECIALLY BAKED SOURDOUGH FRENCH LOAF $3.10
  • HAMBURGER STEAK $3.50
    Served with Grilled Onions, Cottage Cheese and Tomatoes with Choice of Salad
  • FACTORY BURGER OLE $3.10
    Factory Burger with Freshly Sauteed Mushrooms, Lettuce, Chilis and Sour Cream on the Side, Served on Sourdough French Loaf, White or Wheat Roll
  • ENGLISH BROILER $3.75
    Toasted English Muffin, Broiled Tomatoes, Broiled Link Sausage
  • CRAB SALAD $3.95
  • TUNA SALAD $2.50
  • SHARON’S FAVORITE $4.95
    Broiled Muffin with Jack or Cheddar Cheese, Avocado Sliced Tomatoes and Choice of Salad

THE BEVERLY HILLS $2.50
Charbroiled Hamburger, Fresh Avocado, Tomato and Melted Cheddar Cheese, Served on Sourdough French Loaf, White or Wheat Roll
KING SIZE on SPECIALLY BAKED SOURDOUGH FRENCH LOAF $3.75

SOUPS — VEGETABLE SOUP OF THE DAY

  • Small $0.95
  • Large $1.25
    Served with Sourdough French Loaf

SALADS and COLD PLATES

  • THE FABULOUS FACTORY SALAD BAR
    Make Your Own Salad
    MEDIUM $1.95 LARGE $2.95
  • CRAB LOUIE $5.95
  • SHRIMP LOUIE $4.75
  • SHRIMP AND AVOCADO SALAD $3.95
  • THE FACTORY CHEF’S SALAD $3.40
    Assortment of Meat and Cheese with Fresh Vegetables on a Bed of Tossed Greens
  • FRESH FRUIT SALAD
    With Cottage Cheese, Yogurt or Whipped Cream
    SMALL $1.90 LARGE $2.95
  • STUFFED TOMATO or AVOCADO
    WITH TUNA SALAD $3.25
    WITH CHICKEN SALAD $3.25
    WITH EGG SALAD $2.95
    WITH CRAB SALAD $4.95
  • FRESH FRUIT AND CHEESE PLATTER $2.50
  • TOSSED GREEN SALAD $0.95
    WITH CHEESE $1.45
  • CARROT-RAISIN SALAD $0.65
  • POTATO SALAD $0.65
  • COLE SLAW $0.65
  • FRESH WHOLE ARTICHOKE $1.25
    With Melted Butter

SANDWICH CREATIONS

  • AVOCADO DELIGHTS $1.75
    Fresh Avocado, Sliced Tomatoes and Alfalfa Sprouts Lettuce on Any of Our Breads or on Our Specially Baked Sourdough French Loaf with Your Choice of Bacon, Tuna and Bacon $2.95 BEET or TURKEY $2.75
  • THE FACTORY CHEESE MELTS $1.75
    Sprouts, Melted Swiss, Cheddar or Jack Cheese Sliced Tomatoes, Alfalfa
    Served Open-Faced on a Whole Wheat Bun with Your Choice of Bacon or Ham $2.60
  • TUNA SALAD $2.60
  • CRAB SALAD $3.60
  • AVOCADO AND MUSHROOMS $2.60
  • GRILLED SANDWICHES
    Served on Specially Baked Sourdough French Loaf or Your Choice of Bread
    HAM, CHEESE and TOM

This could go on for triple, or quadruple the length. Then if you get a finish_reason: “length” meaning you hit the model max, you could give that assistant message back to the AI in another model call, asking again using the image “continue exactly where you left off”.

The problem is determining when the AI has gone into hallucination and fabrication.