First of all, I would like to thank the OpenAI team. I am waiting for GPT-4 to finally give me the opportunity to learn. I have a few questions to share and discuss
In the trial and test of the past two months, GPT-4’s ability is very good, but in fact, some records and variable information will still be lost in the upper and lower messages.
Context and Quality Assurance
How to remove duplicate data to preserve the user’s context? This may be the most direct method, because TOKENS is very limited. Whether it is submitted by an AI server or a user, resources should not be wasted for calculation and learning. Currently, the industry The de-duplication technology is already very good. Maybe we should do de-duplication or tag recognition for transcoding on repeated topics. The saving effect of this is geometric.
gist identification and quantity
The number of recognition and generation of gist is often found to be invalid. The worst case is that you have to repeat it to Ai several times to correctly identify it.
I think that when AI receives the GIST link, it responds with the most basic line count statistics, so that problems can be found in time.
When the AI replies to the engineering code, it gives too many names of the functions and gives repeated conflicts. It will indeed learn the corresponding documents of the software package, but it will happen that it will modify itself or the user’s original function names. This is obviously crazy, because If you don’t check the order carelessly, you will be bullied by AI.
Training and fine-tuning
The training data may be semi-automated or fully automated in the API documentation. If the expected data or echoed data is manually compared and confirmed, the time consumption is too high; (I may not have read the corresponding correct document)
Whether the logic of the period is so far I still don’t understand how it confirms the response display, whether it is possible to specify the option description of the weight parameter or the order parameter, and the probability ratio setting, which may be more effective for spam and hypertext review, and reduce errors Identification decision.
If there is any wrong information, I hope to get your guidance, thank you.