Custom entity extraction from natural language

I’m working on a project that requires me to extract custom entities from a sentence. The entities are triplets

For example, if the sentence is: “How many more than 20 years old male users viewed a page or logged in in the last 30 days?” The entities are:

<gender, equals, male>,
<age, greater than, 20>,
<event_name, equals, view page>,
<event_name, equals, login>,
<event_timestamp, greater than, 30 days>

The first element of each entity (triplet) comes from the list of columns. The second element is inferred from context (nature of the operator if it’s a single value or array to compare with). The third element is also inferred from the context and must belong to the chosen column (first element).

The user will give a list of columns and all the values inside each column.

Is it possible to first check all the columns that are available, choose one and view their unique values. Once you get it, either choose that column (first element) and value (third element) or look again and repeat these steps. An agent can be designed around this but I’m unable to find proper prompts and iteration techniques to do this.

Any help on this would be great! I’m using langchain for this but using any other approach is fine too.

To be honest I would not rely on a model for that.
You can use it to restructure your data e.g. if the data is not present in a structured way you prompt the model to give you the data you want.

e.g. find all persons in the following short story and give me all their informations

/—
[shortstory]
/—

create a json in the following format: {‘persons’: [{‘name’:…

Then validate the result. First check if it is json, then check the data with regular expressions and then throw it into a relational database.

And from there you get everything you need with database queries.

You can use the model to create a SQL query for you to achieve that.

This way you can even analyse unlimited amounts of data.