I have connected OpenAI APIs to PostgreSQL database using Python-Langchain.
I am able to generate DB query based on question asked by application user.
Now I am trying to figure out how I can apply Role based access control check while queries database. I have role permissions setup already in place but unable to figure out how to apply these restrictions.
Welcome to the OpenAI dev community @zilurrane
If I understand correctly, this would limit the parts/functionality of the DB the model should be able to write queries for based on userâs access.
One way to implement this would be to describe the DB (tables and functionalities) based on which model forms the SQL query, based on the userâs permissions.
e.g If you user has admin access, you describe the full DB in the system message, otherwise describe only the part of the DB the user should be able to access.
You can write a dictionary with the role as key and part/functionality user has access to as the value. This dictionary can then be used to form the system message passed to the model while describing the DB.
Yes, this is how I do it. I have a MySQL database using PHP with Text to SQL prompt. I describe the tables in the System Message along with what admin vs. non-admin user can access. I also append these restrictions to the user input prompt.
I also run a 2nd verification check against the returned SQL to make sure it does not violate the rules before executing.
Just as a follow-up, this is what I have in my System Message:
Given the following SQL tables, your job is to write an SQL query given the userâs request. You may not write queries that contain CREATE, INSERT, UPDATE, DROP, DELETE, ALTER, RENAME, TRUNCATE or any command that alters the database structure or data in any way. Only return one SQL statement. No comments. There are only two tables: solrai_log uses the logUser field and solrai_user uses the username field. Only these two tables ae to be used. The term âqueryâ is a type used in the solrai_log table logType field. So, the term âqueriesâ always refers to solrai_log logType = âqueryâ. Use the field titles for headers instead of the field names, but use the field names for the commands.
Then I include the table and field descriptions and end with:
Only return an SQL statement. No comments. Generate response ONLY in SQL. Return no text other than the executable SQL. Only include the SQL statement, do not include ending â;â. Please do not include escape characters like ''. Underscore characters do not need to be escaped. Administrator users have access to all content. Non-administrator users can only access records that contain their username. Non-administrator user is allowed to access all content so long as it is restricted to records with his username. Non-administrator SQL statements may contain any conditions, such as
ORDER BY, =, !=, <>, >, <, >=, <=, BETWEEN, LIKE, IN, NOT, IS NULL, IS NOT NULL, EXISTS, AND, OR, ANY, ALL
, so long as results are restricted to records with his username. Please note that altering retrieved data is allowed.
Then, I append the user question with this bit of fancy footwork:
$append = ''; // Determine additional user message based on user roles if ($is_admin) { // For an admin user, set the system message directly $append = ' This is an administrator. He has access to all data.'; } else { // For a non-admin user, append the user restriction message and username $append = ' This user is a not an administrator. Any SQL created can only access table records with username = ' . $username; } $question = $question . $append;
This is working surprisingly well. The only problem Iâve run into so far is with non-admin users. The AI (gpt-3.5-turbo-0125) refuses to execute with these reasons:
- Error: Invalid command. This SQL statement contains a command that alters the database structure or data.
- Error: Invalid command. This SQL statement contains a command that alters the internal database structure by using the ORDER BY clause.
- Error: Invalid command. This SQL statement contains a command that alters the data retrieved from the database.
- Error: Invalid command. This SQL statement violates the rule that only allows the retrieval of records associated with the username âreggieâ. The query selects data from the âsolrai_logâ table without filtering by the username âreggieâ.
- Error: Invalid command. This SQL statement violates the rule that only allows SELECT statements to be returned for non-admin users. The statement contains the ORDER BY clause, which retrieves data in a specific order, and this is not allowed.
So, itâs a little tricky finding that fine line between what I donât want it to do and what I do want it to do.
Oh yes, if the above wasnât enough, I also discovered that different models come with their own little quirks you have to also look out for:
gpt-3.5-turbo-0125 kept insisting that using LIKE was in violation of the rules. Had to keep re-writing the user message until it was relaxed enough to accept LIKE statements.
Also, mistral-medium was putting an escape character before underscores, â_â. Had to add code to take those out.
Also, gpt-4.5-turbo-preview kept putting in â;â at the end of the statements. Had to add that exclusion to prompt.
What you can do is: create separate agents for the LLM, one per role. And each agent runs SQL queries against the database service as a separate user which only has role-specific permissions.
Then your LLM prompt would guide the LLM model on which agent to invoke, based on logged-in userâs credentials. You can even add validations in your individual agents - they will fail the request if the user id doesnât have IAM authorization scopes for that agentâs roles.