How to apply user based access control check while generating SQL queries using OpenAI API using langchain?

zilurrane · February 10, 2024, 4:43pm

I have connected OpenAI APIs to PostgreSQL database using Python-Langchain.
I am able to generate DB query based on question asked by application user.
Now I am trying to figure out how I can apply Role based access control check while queries database. I have role permissions setup already in place but unable to figure out how to apply these restrictions.

sps · February 10, 2024, 5:33pm

Welcome to the OpenAI dev community @zilurrane

If I understand correctly, this would limit the parts/functionality of the DB the model should be able to write queries for based on user’s access.

One way to implement this would be to describe the DB (tables and functionalities) based on which model forms the SQL query, based on the user’s permissions.

e.g If you user has admin access, you describe the full DB in the system message, otherwise describe only the part of the DB the user should be able to access.

You can write a dictionary with the role as key and part/functionality user has access to as the value. This dictionary can then be used to form the system message passed to the model while describing the DB.

SomebodySysop · February 10, 2024, 6:28pm

Yes, this is how I do it. I have a MySQL database using PHP with Text to SQL prompt. I describe the tables in the System Message along with what admin vs. non-admin user can access. I also append these restrictions to the user input prompt.

I also run a 2nd verification check against the returned SQL to make sure it does not violate the rules before executing.

SomebodySysop · February 11, 2024, 12:31am

Just as a follow-up, this is what I have in my System Message:

Given the following SQL tables, your job is to write an SQL query given the user’s request. You may not write queries that contain CREATE, INSERT, UPDATE, DROP, DELETE, ALTER, RENAME, TRUNCATE or any command that alters the database structure or data in any way. Only return one SQL statement. No comments. There are only two tables: solrai_log uses the logUser field and solrai_user uses the username field. Only these two tables ae to be used. The term ‘query’ is a type used in the solrai_log table logType field. So, the term ‘queries’ always refers to solrai_log logType = ‘query’. Use the field titles for headers instead of the field names, but use the field names for the commands.

Then I include the table and field descriptions and end with:

Only return an SQL statement. No comments. Generate response ONLY in SQL. Return no text other than the executable SQL. Only include the SQL statement, do not include ending “;”. Please do not include escape characters like ''. Underscore characters do not need to be escaped. Administrator users have access to all content. Non-administrator users can only access records that contain their username. Non-administrator user is allowed to access all content so long as it is restricted to records with his username. Non-administrator SQL statements may contain any conditions, such as ORDER BY, =, !=, <>, >, <, >=, <=, BETWEEN, LIKE, IN, NOT, IS NULL, IS NOT NULL, EXISTS, AND, OR, ANY, ALL , so long as results are restricted to records with his username. Please note that altering retrieved data is allowed.

Then, I append the user question with this bit of fancy footwork:

  $append = '';
  // Determine additional user message based on user roles
  if ($is_admin) {
  	// For an admin user, set the system message directly
  	$append = ' This is an administrator.  He has access to all data.';
  } else {
  	// For a non-admin user, append the user restriction message and username
  	$append = ' This user is a not an administrator.  Any SQL created can only access table records with username = ' . $username;
  }
  $question = $question . $append;

This is working surprisingly well. The only problem I’ve run into so far is with non-admin users. The AI (gpt-3.5-turbo-0125) refuses to execute with these reasons:

Error: Invalid command. This SQL statement contains a command that alters the database structure or data.
Error: Invalid command. This SQL statement contains a command that alters the internal database structure by using the ORDER BY clause.
Error: Invalid command. This SQL statement contains a command that alters the data retrieved from the database.
Error: Invalid command. This SQL statement violates the rule that only allows the retrieval of records associated with the username ‘reggie’. The query selects data from the ‘solrai_log’ table without filtering by the username ‘reggie’.
Error: Invalid command. This SQL statement violates the rule that only allows SELECT statements to be returned for non-admin users. The statement contains the ORDER BY clause, which retrieves data in a specific order, and this is not allowed.

So, it’s a little tricky finding that fine line between what I don’t want it to do and what I do want it to do.

Oh yes, if the above wasn’t enough, I also discovered that different models come with their own little quirks you have to also look out for:

gpt-3.5-turbo-0125 kept insisting that using LIKE was in violation of the rules. Had to keep re-writing the user message until it was relaxed enough to accept LIKE statements.

Also, mistral-medium was putting an escape character before underscores, ‘_’. Had to add code to take those out.

Also, gpt-4.5-turbo-preview kept putting in ‘;’ at the end of the statements. Had to add that exclusion to prompt.

abhishek5 · June 14, 2024, 12:31am

What you can do is: create separate agents for the LLM, one per role. And each agent runs SQL queries against the database service as a separate user which only has role-specific permissions.

Then your LLM prompt would guide the LLM model on which agent to invoke, based on logged-in user’s credentials. You can even add validations in your individual agents - they will fail the request if the user id doesn’t have IAM authorization scopes for that agent’s roles.

Topic		Replies	Views
How to handle role-based access while generating text-to-SQL queries across multiple databases? Community embeddings , gpt-4 , chatgpt	0	50	January 22, 2025
Role based access in LLM's API gpt-4 , chatgpt	4	2588	June 14, 2024
How to make Multi User Role Based Access Chatbot which generate different answers for each user depending on their knowledge base? API chatgpt , api	7	1219	July 16, 2024
Natural Language to SQL with huge table schema API	12	7760	December 19, 2023
Seeking Guidance on Building a ChatGPT-Style Data Analyst Tool with Database Integration Plugins / Actions builders gpt-4 , chatgpt , api , openai	11	3801	September 23, 2024

How to apply user based access control check while generating SQL queries using OpenAI API using langchain?

Related topics