So I was experimenting the excellent post by @dlaytonj2 here on batch api here Fun with the Batch API - An example
And decide to give your first page a go with text recognition from gpt-4o-mini in batch. I can successfully get the multi columnar thing to read properly.
**Appendix MM-A: Miscellaneous Creatures**
This appendix contains statistics for various animals, vermin, and other critters. The stat blocks are organized alphabetically by creature name.
### Ape
- **Medium beast, unaligned**
- **Armor Class:** 12
- **Hit Points:** 19 (3d8 + 6)
- **Speed:** 30 ft., climb 30 ft.
**STR** | **DEX** | **CON** | **INT** | **WIS** | **CHA**
16 (+3) | 14 (+2) | 14 (+2) | 6 (−2) | 12 (+1) | 7 (−2)
**Skills:** Athletics +5, Perception +3
**Languages:** —
**Challenge:** 1/2 (100 XP)
**Actions:**
- **Multiattack:** The ape makes two fist attacks.
- **Fist. Melee Weapon Attack:** +5 to hit, reach 5 ft., one target. Hit: 1d6 + 3 bludgeoning damage.
- **Rock. Ranged Weapon Attack:** +5 to hit, range 25/50 ft., one target. Hit: 6 (1d6 + 3) bludgeoning damage.
---
### Awakened Shrub
- **Small plant, unaligned**
- **Armor Class:** 9
- **Hit Points:** 10 (3d6)
- **Speed:** 20 ft.
**STR** | **DEX** | **CON** | **INT** | **WIS** | **CHA**
3 (−4) | 8 (−1) | 11 (+0) | 10 (+0) | 10 (+0) | 6 (−2)
**Damage Vulnerabilities:** Fire
**Damage Resistances:** Piercing
**Senses:** Passive Perception 10
**Languages:** One language known by its creator
**Challenge:** 0 (10 XP)
**Actions:**
- **Rake. Melee Weapon Attack:** +1 to hit, reach 5 ft., one target. Hit: 1 (1d4 − 1) slashing damage.
An awakened shrub is an ordinary shrub given sentience and mobility by the **awaken** spell or similar magic.
---
### Awakened Tree
- **Huge plant, unaligned**
- **Armor Class:** 13 (natural armor)
- **Hit Points:** 59 (7d12 + 14)
- **Speed:** 20 ft.
**STR** | **DEX** | **CON** | **INT** | **WIS** | **CHA**
19 (+4) | 6 (−2) | 15 (+2) | 10 (+0) | 10 (+0) | 7 (−2)
**Damage Vulnerabilities:** Fire
**Damage Resistances:** Bludgeoning, piercing
**Senses:** Passive Perception 10
**Languages:** One language known by its creator
**Challenge:** 2 (450 XP)
**Actions:**
- **Slam. Melee Weapon Attack:** +6 to hit, reach 10 ft., one target. Hit: 14 (3d6 + 4) bludgeoning damage.
An awakened tree is an ordinary tree given sentience and mobility by the **awaken** spell or similar magic.
---
### Axe Beak
- **Large beast, unaligned**
- **Armor Class:** 11
- **Hit Points:** 19 (3d10 + 3)
- **Speed:** 50 ft.
The key insight was to provide the format in the system_context and the user_context
SYSTEM_IMAGE_READER_CONTEXT = “You are an expert at reading text in the image.”
USER_IMAGE_READER_CONTEXT = “The format is structured in multiple columns. Obviously the text must follow as a human would read it.”
Can you please take a quick look at let me know if the raw text extraction looks ok?