Optimize natural data creation from JSON

Vaughn · April 19, 2023, 8:04am

Hi, I am using embeddings (text-embedding-ada002) to inject Football Player data into chatGPT to answer questions and the results are okay, but I am not completely happy. For example, it will often retrieve players who don’t play in the specified positions (asking for a leftback and it gives me left-footed players who are not necessarily leftbacks). For other information such as strong foot or nationality it works basically perfectly. I think the way to improve the performance in this field is likely to improve the natural language data that I am using for the embeddings.

The original data I have access to is in JSON format and I wrote a script to transform it into natural language. In the JSON, a player might be represented like this:

[
{
“id”: “2000020333”,
“name”: “Olivier Aertssen”,
“age”: “17”,
“currentAbility”: 95,
“potentialAbility”: 160,
“club”: “Ajax”,
“nationalities”: [
“NED”
],
“positions”: [
“D (C)”,
“DM”
],
“askingPrice”: “€9.75M”,
“contractLength”: “30/6/2025”,
“personality”: “Balanced”,
“searchString”: “Olivier Aertssen”,
“attributes”: {
“technicals”: {
“crossing”: 6,
“corners”: 5,
“firstTouch”: 10,
“finishing”: 5,
“dribbling”: 11,
“heading”: 11,
“freekicks”: 7,
“marking”: 9,
“longThrow”: 6,
“longshots”: 6,
“passing”: 13,
“penalties”: 7,
“tackling”: 10,
“technique”: 10
},
“mentals”: {
“workrate”: 13,
“vision”: 12,
“teamwork”: 12,
“positioning”: 12,
“offTheBall”: 11,
“leadership”: 8,
“flair”: 12,
“determination”: 12,
“decisions”: 11,
“concentration”: 11,
“composure”: 11,
“bravery”: 12,
“anticipation”: 10,
“aggression”: 10
},
“physicals”: {
“acceleration”: 11,
“agility”: 11,
“balance”: 9,
“jumpingReach”: 12,
“naturalFitness”: 11,
“pace”: 10,
“strength”: 12,
“stamina”: 13
}
}
}
]

which i then turn into this natural data to embed:

Olivier Aertssen is an Either-footed player currently playing for Ajax. His nationality is NED and his preferred positions are Central Defender. The player who just appeared on people’s radar fetches a decent price of €9.75M. Still being very young and even a teenager at 17 years, his current ability is not up to par and he is considered to be a real talent and a wonderkid. He has a lot of potential left to grow. He shapes the game with his pinpoint accurate passes. He can play great opening passes, get lots of assists and play key passes. He has lots of bravery which is why he generally doesn’t back down from challenges. He has good levels of determination and works hard to achieve his goals on and off the pitch. He shows his flair on the ball regularly, doing unpredictable moves and passes. He has good defensive positioning and leaves little room for attackers. His workrate is really admirable. He runs up and down the field the whole game and is willing to go the extra mile. He has really impressive stamina and is a proper workhorse. He will still look fit at the end of the game and won’t have to be subbed off. Finishing really isn’t one of his strong suits. His shots often miss the goal. He shouldn’t take corner kicks under any circumstance.

Basically, i’m looking at his strengths and weaknesses and generate one or two sentences about each of them. Does anyone have any suggestions on how I could improve this data to get better results from my embeddings, more specifically how can I make it put more emphasis on the player’s position and not ignore it?

udm17 · April 19, 2023, 10:31am

You can try adding either the whole position instead of the short form or adding a prompt to the NL generator where you provide it with the full form of all the positions and then get it to generate the description.

The first approach is likely going to be the better one in most of the cases

Vaughn · April 19, 2023, 10:46am

I am already doing that, see how “D (C)” is “Central Defender” in the natural language text.

Topic		Replies	Views
Preparing complex data for embedding that is originally in JSON API	2	3331	August 4, 2023
How to Implement OpenAPI Structure to Retrieve Maximum Stat from My API? Plugins / Actions builders plugin-development , api	4	336	July 31, 2023
How would you build a content improver/positive spinning engine on top of GPT-3? API	4	380	November 9, 2021
Processing JSON - best practices? API	1	237	February 1, 2024
Help with determining if its less efficient to create embeddings based on JSON Community gpt-4 , chatgpt , api , vector-db	5	5348	December 23, 2023

Optimize natural data creation from JSON

Related Topics