What the best way to help OpenAi extract text from a document

I’m trying to extract text from documents. most time OpenAi give back great results but sometimes if the format changes it breaks.

For example I’m extracting data from farm auctions results so the document has header like Cows, Steers, Sheep, etc. and I need the header and the following paragraph.
But on some documents they are split so it would be Price per head Cows, Price per KG Sheep etc.

So would the best approach to use fine tuning some thing along the lines of

`{"messages": [
{"role": "system", "content": "You are a helpful bot and will extract data as requested."}, 
{"role": "user", "content": "Find me all the Auctions types"},
 {"role": "assistant", "content": "Named  Price per head Cows"},
{"role": "user", "content": "Find me all the Auctions types"},
 {"role": "assistant", "content": "Named  Price per kg Cows"},
{"role": "user", "content": "Find me all the Auctions types"},
 {"role": "assistant", "content": "Named Cows"},

]}`

or is there a better option like formating the text before I use the Api (e.g adding bold tags) and ask OpenAi to find those bold tags?

Or is ther a totally different option I haven’t thought of?

(BTW I know my fine tuning is terrrbile but its just a sample!)

1 Like

Hi @Steve_in_York (Steve) and welcome to the community!

I would just use structured outputs to define the schema of exactly what you would like the API to return. In the system prompt it would be helpful to specify some examples of how the information might be presented. Otherwise it should work fine without finetuning.

If you want to do some pre-processing beforehand, it would be helpful as well. I am not sure how your documents look like, but presenting them in some standardized format like Markdown may help quite a bit.

2 Likes

Are you using Structured Outputs? If the issue is that a change of inputs breaks your response then Structured Outputs should fix that.

If Structured Outputs doesn’t fully address the issue we can give you some prompting tips for how to help ensure the model knows what you’re looking to extract.

2 Likes

no not using structure inputs, I’m still pretty new to OpenAi but looking at some code samples and tutorials it seem like it could be helpful!

Thanks for the tip! :+1:

1 Like