How Does One Extract Sequentially From a PDF

So I was experimenting the excellent post by @dlaytonj2 here on batch api here Fun with the Batch API - An example

And decide to give your first page a go with text recognition from gpt-4o-mini in batch. I can successfully get the multi columnar thing to read properly.

**Appendix MM-A: Miscellaneous Creatures** 

This appendix contains statistics for various animals, vermin, and other critters. The stat blocks are organized alphabetically by creature name.

### Ape

- **Medium beast, unaligned**  
- **Armor Class:** 12  
- **Hit Points:** 19 (3d8 + 6)  
- **Speed:** 30 ft., climb 30 ft.  

**STR** | **DEX** | **CON** | **INT** | **WIS** | **CHA**  
16 (+3) | 14 (+2) | 14 (+2) | 6 (−2) | 12 (+1) | 7 (−2)  

**Skills:** Athletics +5, Perception +3  
**Languages:** —  
**Challenge:** 1/2 (100 XP)  

**Actions:**  
- **Multiattack:** The ape makes two fist attacks.  
- **Fist. Melee Weapon Attack:** +5 to hit, reach 5 ft., one target. Hit: 1d6 + 3 bludgeoning damage.  
- **Rock. Ranged Weapon Attack:** +5 to hit, range 25/50 ft., one target. Hit: 6 (1d6 + 3) bludgeoning damage.  

---

### Awakened Shrub

- **Small plant, unaligned**  
- **Armor Class:** 9  
- **Hit Points:** 10 (3d6)  
- **Speed:** 20 ft.  

**STR** | **DEX** | **CON** | **INT** | **WIS** | **CHA**  
3 (−4) | 8 (−1) | 11 (+0) | 10 (+0) | 10 (+0) | 6 (−2)  

**Damage Vulnerabilities:** Fire  
**Damage Resistances:** Piercing  
**Senses:** Passive Perception 10  
**Languages:** One language known by its creator  
**Challenge:** 0 (10 XP)  

**Actions:**  
- **Rake. Melee Weapon Attack:** +1 to hit, reach 5 ft., one target. Hit: 1 (1d4 − 1) slashing damage.  

An awakened shrub is an ordinary shrub given sentience and mobility by the **awaken** spell or similar magic.

---

### Awakened Tree

- **Huge plant, unaligned**  
- **Armor Class:** 13 (natural armor)  
- **Hit Points:** 59 (7d12 + 14)  
- **Speed:** 20 ft.  

**STR** | **DEX** | **CON** | **INT** | **WIS** | **CHA**  
19 (+4) | 6 (−2) | 15 (+2) | 10 (+0) | 10 (+0) | 7 (−2)  

**Damage Vulnerabilities:** Fire  
**Damage Resistances:** Bludgeoning, piercing  
**Senses:** Passive Perception 10  
**Languages:** One language known by its creator  
**Challenge:** 2 (450 XP)  

**Actions:**  
- **Slam. Melee Weapon Attack:** +6 to hit, reach 10 ft., one target. Hit: 14 (3d6 + 4) bludgeoning damage.  

An awakened tree is an ordinary tree given sentience and mobility by the **awaken** spell or similar magic.

---

### Axe Beak

- **Large beast, unaligned**  
- **Armor Class:** 11  
- **Hit Points:** 19 (3d10 + 3)  
- **Speed:** 50 ft.

The key insight was to provide the format in the system_context and the user_context

SYSTEM_IMAGE_READER_CONTEXT = “You are an expert at reading text in the image.”
USER_IMAGE_READER_CONTEXT = “The format is structured in multiple columns. Obviously the text must follow as a human would read it.”

Can you please take a quick look at let me know if the raw text extraction looks ok?

2 Likes