How to use `function_call` to find the entity information that best matches the full list of fund names and index names from the text entered by the user

I hope to extract the names of funds and indices from the text input by users, and ensure that the extracted names of funds and index match the full list of fund and index names to the greatest extent possible. However, I defined the enum values for each field in the properties within function_call, and found that the final extracted entity information still does not match the full list of enum information. What else I try or did I miss something?

I would greatly appreciate any guidance or suggestions on how to approach this problem effectively. :smiling_face:

Here is the definition of tools I have used



{
  "type": "function",
  "function": {
    "name": "information_extraction",
    "description": "Extracts the relevant information from the passage.\n\n                    PROPERTY DESCRIPTION:\n                    - description: the description information of the entity.\n                    - enum: The enumeration values of the entity.\n                    ",
    "parameters": {
      "type": "object",
      "properties": {
        "info": {
          "type": "array",
          "items": {
            "type": "object",
            "properties": {
              "etf_fund_name": {
                "description": "ETF基金名称",
                "enum": [
                  "中创400交易型开放式指数证券投资基金",
                  "战略新兴成指交易型开放式指数证券投资基金",
                  "广发中证全指可选消费交易型开放式指数证券",
                  "建信创业板交易型开放式指数证券投资基金",
                  "广发创业板交易型开放式指数证券投资基金",
                  "广发添利交易型货币市场基金",
                  "华泰柏瑞沪深300交易型开放式指数证券投",
                  "华安中证500低波ETF",
                  "深证红利交易型开放式指数证券投资基金",
                  "平安沪深300交易型开放式指数证券投资基"
                ],
                "type": "string"
              },
              "index_name": {
                "description": "指数名称",
                "enum": [
                  "中小创业企业400指数",
                  "中国战略新兴产业成份指数",
                  "中证全指可选消费指数",
                  "创业板指数",
                  "创业板指数",
                  "活期存款利率(税后)",
                  "沪深300指数",
                  "上证180等权重指数",
                  "深证红利指数",
                  "沪深300指数"
                ],
                "type": "string"
              }
            },
            "required": []
          }
        }
      },
      "required": [
        "info"
      ]
    }
  }
}

Here are some examples of the sentences I’m dealing with:

华安电子50ETF于2021年1月8日上市。基金简称华安中证电子50ETF,跟踪标的为中证电子50指数(931461.CSI,指数简称:电子50),基金经理为许之彦博士。

电子50指数未来两年成长性优秀,目前估值合理。2009年以来电子50指数累积上涨了432%,12年年化涨幅15%。2019年以来电子行业景气复苏,电子50指数上涨了111.5%,年化涨幅45%,未来仍是电子行业的快速增长期,相对2018年,成份股2022年预期净利润将累计增长207%。未来两年净利润复合增速31.6%,成长性优秀,基于2020年预期净利润的PE估值为50倍,基于2022年预期净利润的PE估值为29倍。

成份股定价显示1月7日电子50指数收盘价仍低于明年基本面价值。基于成份股定价,电子50指数2021和2022年预期基本面价值对应点位4748和6081点,2021年1月7日收盘于5326点,仅透支今年基本面及明年部分预期。2020年电子50指数波动于2020年和2022年基本面价值区间,没有过度透支未来基本面。

配置华安电子50ETF收益可观且相对安全。从预期收益率来看,透支完明年基本面仍可获得16%的收益,线性外推2023年基本面价值增长28%,透支完后年基本面可获得48%的收益,配置收益较为可观,业绩透支合理故配置安全性高。
I hope the result would be:

{"etf_fund_name":"华安中证500低波ETF","index_name":"中证电子50指数"}

but in fact the result is:

    {
      "etf_fund_name": "华安中证电子50ETF",
      "index_name": "中证电子50指数"
    }

I think it an correct approach, that you limit inputs to an enum. But the problem is gpt4 is not respecting the limitation :thinking: Maybe translate to english first, where it has more experience?

Or separate out the calls, one call for the index, and then a separate call for the fund. Actually, not every fund is on every index, right?

Same here.

But, I just recorded this video on how to extract product names from a text using simple prompt engineering: https://www.youtube.com/watch?v=VdpneLeBkaE

thank you very much,as you can see, the correct answer of etf_fund_name is ''华安中证500低波ETF",but your answer is ‘华安中证电子50ETF’.
I will recommend your platform to some friends.
Have a nice day! :blush:

Thank you very much, but my problem is a little different from yours, I hope to find the most similar fund name in the list of fund names.
Have a nice day! :wink:

Good suggestions! I have tried it out, just got the same result.
Have a nice day! :wink:

Maybe the function description is fighting with the parameter name. The description leads the AI to think it should extract an entity from the prompt, which it does. The function name information_extraction also reinforces this. Maybe the heavy emphasis on extraction is overriding the fact it is an enum. So rename the function and change description to more like classification. Also, if the parameters are marked as required

Sorry I missed that :sweat_smile:
However, as mentioned, I just provided an example of how the prompt should look like- having a general description of the task needs to be done, along with the expected json structure.

I thought your case was a simple case of extraction, but now I understand it isn’t:
The incorrect output ‘华安中证电子50ETF’ is provided in the input, while the expected one ‘华安中证500低波ETF’ is not- so, what is exactly the work needs to be done here? why are you expecting ‘华安中证500低波ETF’ in the output?
You can change the instruction in my example prompt to the case you care about (possibly in Chinese) and see what works best.

I made a gpt for it, with the prompt:
华安电子50ETF于2021年1月8日上市。基金简称华安中证电子50ETF,跟踪标的为中证电子50指数(931461.CSI,指数简称:电子50),基金经理为许之彦博士。

I got an value of:

"华安中证500低波ETF"

Its correct, right? Maybe you were using API with only GPT3 and not GPT4?