assume we are providing an application for permission management in a company.
We want to provide a chat bot for our end users to allow them to collect information and perform tasks.
A conversation of an administrator could look like this:
Admin: Hi, in how many groups is John Doe an owner?
Bot: Hi, which ‘John Do’ do you mean: John M. Doe (ID: 2343) or John Doe Jr. (ID: 6543)
Admin: I mean John M. Doe
Bot: John M. Doe (ID: 2343) is owner in 10 groups
Admin: Show me these groups
Bot: [Shows a list of groups]
Admin: Thx, now remove John from group group_123 please
Bot: So I should remove John Doe (ID: 2343) from group_123 (ID:8765)?
Admin: Yes please
Bot: Done, removed
So we want to understand the intention of the user, provide the information from our backend and execute the task in the end.
My question: Is such a conversation even possible and a valid use case for AI?
And if so: how would such a training data for fine tuning may look like?
Considering the use of OpenAI models:
It is possible using Embeddings
It is possible, with a good
System role with precise and concise instructions - context maintenance is the main goal in such a case. Such instructions may have confirmation procedures for changes but… (see below).
In this way, it is not possible, the dataset is static and the models don’t perform “external tasks” - the user may ask for a script format of all changes to be performed by a third-party permission management system. After changes have been applied, the updated dataset shall be uploaded once more.
Admin: Please provide the script in PerMgr_XYZ format with all changes.
Bot: Sure, here's your script:
REMOVE User: John M. Doe, UID: 2343, FROM Group: group_123, GID: 8765;
For such an example, the dataset is something “linear”, and it will not require such extensive training. If the “script” is a
XML type will require a low to medium time for training. However, if it is a third-party proprietary language it will require a reasonable training time. None of these cases would require a great amount of data, since the dataset is static - the user may train the model as much as necessary to get correct responses from the model. But these conditions are for this specific use-case of this example.
If the use case requires a more complex dataset with multiple relationships and not-so-simple tasks requiring more analysis from the model, maybe the training time will increase exponentially accordingly to some complexity measurement of your choice. But this is just my opinion.