I’ve recently published “MLB Stats” to the Plugin Store that retrieves up-to-date baseball statistics. For some of my queries, the model outputs different numbers from what it is retrieving from the plugin. I’m curious if anyone else has faced a similar problem and what you did to fix it. All data being outputted from the plugin is JSON with clear descriptions in openapi.yaml
Thanks. Just updated the title. I just need it to output and interpret the data verbatim from what the plugin gives.
Player’s batting average: .248
Sometimes the ChatGPT outputs : .256
Not sure why.
Can you provide an example of a full response from the plugin and your prompt? It’s hard to determine where ChatGPT might be pulling/generating that information, or why, without that information. The most common cause is that information exists elsewhere and it happens to hold a heavier weight in the context for whatever reason.
You bring up an interesting point about “weights” based on other information out there. I’m seeing errors in output a lot when trying to get top prospect information because the model gets the top prospects from 2 years ago before the cutoff.
Example prompt: Get the combined team batting performance for the Atlanta Braves
Plugin output:
[
{
"teamIDfg": 16,
"Season": 2023,
"Team": "ATL",
"Age": 29,
"G": 1250,
"AB": 3054,
"PA": 3401,
"H": 827,
"1B": 496,
"2B": 154,
"3B": 8,
"HR": 169,
"R": 499,
"RBI": 480,
"BB": 294,
"IBB": 10,
"SO": 728,
"HBP": 30,
"SF": 21,
"SH": 0,
"GDP": 76,
"SB": 70,
"CS": 12,
"AVG": 0.271,
"GB": 1033,
"FB": 871,
"LD": 439,
"IFFB": 63,
"Pitches": 13384,
"Balls": 4831,
"Strikes": 8553,
"IFH": 66,
"BU": 4,
"BUH": 3,
"BB%": 0.086,
"K%": 0.214,
"BB/K": 0.4,
"OBP": 0.339,
"SLG": 0.492,
"OPS": 0.831,
"ISO": 0.222,
"BABIP": 0.302,
"GB/FB": 1.19,
"LD%": 0.187,
"GB%": 0.441,
"FB%": 0.372,
"IFFB%": 0.072,
"HR/FB": 0.194,
( etc.)
Sometimes the model outputs the batting average as .275
I have noticed improvements from using the plugin more.
Am I correct in assuming the batting average is 0.271? It’s likely that the acronyms aren’t helping you out much either, unless you’re providing context for what those acronyms mean
Yes you’re correct. I’m trying to provide good descriptions to support the data too.
getTeamBattingCombinedResponse:
type: object
properties:
Team:
type: string
description: >
The team's 3 letter abbreviation. All subsequent properties within the object
are the team's combined batting statistics.
teamIDfg:
type: integer
Season:
type: integer
Age:
type: integer
description: The average age of the team
G:
type: integer
AB:
type: integer
description: The combined total number of at-bats for the team
PA:
type: integer
description: The combined total number of plate appearances for the team
H:
type: integer
description: The combined total number of hits for the team
1B:
type: integer
description: The combined number of singles for the team
2B:
type: integer
description: The combined number of doubles for the team
3B:
type: integer
description: The combined number of triples for the team
HR:
type: integer
description: The combined number of home runs for the team
R:
type: integer
description: The combined number of runs for the team
RBI:
type: integer
description: The combined number of runs batted in for the team
BB:
type: integer
description: The combined number of base on balls - also known as walks - for the team
IBB:
type: integer
description: The combined number of intentional walks for the team
SO:
type: integer
description: The combined number of strikeouts for the team
HBP:
type: integer
description: The combined number of times players have been hit by a pitch
SLG:
type: number
description: The team's slugging percentage
OPS:
type: number
description: The team's On-base plus slugging percentage
AVG:
type: number
description: The team's combined batting average
GB:
type: number
description: The combined total number of ground balls
FB:
type: number
description: The combined total number of fly balls
LD:
type: number
description: The combined total number of line drives`
I would, perhaps, consider re-structuring the JSON you send back.
The first thing I would do would be to replace the abbreviations and acronyms with meaningful descriptors to make it easier for ChatGPT to parse out what it needs to get.
The second thing I would do would be to nest things where appropriate.
So, your returned JSON might look something like:
[
{
"teamIDfg": 16,
"season": 2023,
"team": "ATL",
"age": 29,
"games": 1250,
"runs": 499,
"runs_batted_in": 480,
"plate_appearances": {
"total": 3401,
"at_bats": {
"total": 3054,
"hits": {
"total": 827,
"singles": 496,
"doubles": 154,
"triples": 8,
"home_runs": 169
},
"bases_on_balls":{
"total": 294,
"intentional_bases_on_balls": 10
},
"strike_outs": 728,
"hit_by_pitch": 30
}
}
}
]
I think that is something ChatGPT will be much more able to work with.
Thank you ! This seems like a good approach. I will try this
Please update with results when you have them.
I ended up trying something that I used on another one of my functions that I was having inaccurate output from GPT on. I liked your method @elmstedt , but I would’ve had to restructure basically everything since all my functions are using the common baseball acronyms. What worked for me is just an additional layer of filtering and even SORTING.
My openapi.yaml changes:
/team_batting_combined:
get:
operationId: getTeamBattingCombined
summary: >
Retrieves the combined batting statistics for all teams across the MLB from Fangraphs for the specified season.
This function should be used whenever a prompt is asking for combined statistics.
parameters:
- in: query
name: year
required: true
schema:
type: integer
description: The year from which the batting statistics should be retrieved from
- in: query
name: team_abbreviation
required: false
schema:
type: string
description: >
The 3 letter abbreviated name of the baseball team. This parameter should be passed
to get the combined statistics for a specific team.
example: NYY
- in: query
name: batting_stat
required: false
schema:
type: string
enum: ['H', '2B', '3B', 'HR', 'RBI', 'BB', 'IBB', 'SO', 'HBP', 'SH', 'SF', 'GDP', 'SB', 'CS', 'AVG',
'OBP', 'SLG', 'OPS']
description: Can be used to filter based on a certain batting statistic.
I’ve added ‘batting_stat’ as an additional optional query parameter acting as a filter to only get that specific stat if needed. I’ve found through my testing on my other functions too that ChatGPT, as you mentioned, might get confused on the many acronyms and data it is getting thrown.
(I made some modifications in my code which I can send as well if interested)
Inaccurate example output before the changes (data isn’t current with 2023 season or is just made up…not sure):
After the changes (all data is accurate):
This is what the JSON data now being retrieved from the plugin looks like:
[
{
"Team": "TEX",
"AVG": 0.274
},
{
"Team": "ATL",
"AVG": 0.271
},
{
"Team": "MIA",
"AVG": 0.265
},
{
"Team": "BOS",
"AVG": 0.264
},
{
"Team": "WSN",
"AVG": 0.261
},
{
"Team": "TOR",
"AVG": 0.259
},
{
"Team": "TBR",
"AVG": 0.259
},
{
"Team": "ARI",
"AVG": 0.258
},
{
"Team": "PHI",
"AVG": 0.258
},
{
"Team": "CIN",
"AVG": 0.257
},
{
"Team": "LAA",
"AVG": 0.256
},
{
"Team": "COL",
"AVG": 0.255
}
Sorting seems to help the model understand better too