Named Entities Extraction Problem

I have an app where I query private data that I fed into Pinecone. I get reasonable results. However, I am trying to do Pinecone filtering as well. For that I put my prompt through GPT-4 Completion API along with the following Context:
*Named Entity is defined as one or more of the following: *
*Name of a person, an organization, such as: a company, a corporate entity, a government institution, a military unit. *
Name of a geographic location such as country, city, region, specific address, mountain, river, incident locations.
Name of a document, legal concepts or terms such as Plaintiff, Defendant, latin terms like Pro Bono, specific laws and legal acts or procedures, legal doctrines or theories, case identifiers, names of operations or incidents related to cases, document titles and types, clause names and identifiers, legal entities such as estate or trust or another entities like events, dates, times, percentages, monetary values that can be essential in specific use-cases.
For the following Context identify and list Named Entities:

Obviously that the user prompt follows the above. It usually has the names of people, legal terms, and some of other stuff that should be identified as Named Entities. However, the API return nothing, zero!.

I am aware that GPT-4 is not designed for this purpose but getting NOTHING?!

Anybody has any advice on that? The purpose of the extracting of Named Entities is to be able to filter the data in case the Pinecone vector DB has more than one document with similar content, etc.

1 Like

can you use function calling?

{
        name: "extract_named_entities",
        description: "Extract all named entities from the user's given text.",
        parameters: {
            type: "object",
            properties: {
                entities: {
                    type: "array",
                    items: {
                        type: "string",
						description: "Name of a person, an organization, such as: a company, a corporate entity, a government institution, a military unit, geographic location such as country, city, region, specific address, mountain, river, incident locations, name of a document, legal concepts or terms such as Plaintiff, Defendant, latin terms like Pro Bono, specific laws and legal acts or procedures, legal doctrines or theories, case identifiers, names of operations or incidents related to cases, document titles and types, clause names and identifiers, legal entities such as estate or trust or another entities like events, dates, times, percentages, monetary values that can be essential in specific use-cases"
                        }
                    }
                }
            },
            required: ["entities"]
        }
    }

I don’t think so. First of all I am writing in C# and not sure how to use functions…yet.

Hi @Securigy

Here’s a working example with gpt-3.5-turbo - OpenAI Platform

1 Like

Oh, so you say that I need to set it as system : content instead of context as part of user : content.

My other problem is using the Named Entity as filter in query of Pinecone. The query throws the exception. Here is how construct the filtering in C#:

        List<string> contextList = new List<string>();
        contextList.Add(mConfig.Pinecone.NamedEntitiesContext);
        List<String> namedEntityList = await ExtractNamedEntities(query, contextList);
        if (namedEntityList != null && namedEntityList.Count != 0)
        {
            MetadataValue[] mvArr = new MetadataValue[namedEntityList.Count];
            for (int i = 0; i < mvArr.Length; i++)
            {
                mvArr[i] = namedEntityList[i];
            }
            metaFilter = new MetadataMap();
            metaFilter.Add("$in", mvArr);

            //MetadataMap metaFilter = new MetadataMap
            //{
            //    ["$in"] = mvArr
            //};
        }

        if (embList != null && embList.Any() && embList[0] != null)
        {
            float[] floatArray = embList[0].Embedding.ToArray();

            //Segment
            IndexClient<RestTransport> indexClient = await mPineconeDb.GetIndex(mIndexName);

            // Query scored vectors by a new, previously unseen vector created from prompt
            ScoredVector[] scoredVectors = await indexClient.Query(floatArray, topK: zoom, filter: metaFilter, includeMetadata: true); //indexNamespace: "myTestNamespace", 
            List<ScoredVector> scoredList = scoredVectors.ToList();

            ScoredVector maxScoredVector = scoredList.Aggregate((i1, i2) => i1.Score > i2.Score ? i1 : i2);
            idMaxScored = Int32.Parse(maxScoredVector.Id);
            
            return (scoredList, idMaxScored);
        }