What I’m trying to understand:
Users on my (Wordpress) project can publish their own posts. I want the chatbot I’m building to have access to their content. But including all their posts in the initial prompt quickly hits the token limit.
How can I approach something like this in general?
I’ve read about embeddings, custom assistants (with Retrieval of post exports as files?), and fine-tuned models. But I’m struggling to understand what the most sensible approach would be in this case.
Ultimately I want the chat to be able to respond to prompts like:
How many posts do I have?
When did I write my longest post?
List all the titles of my posts in november.
No editing capabilities, just queries on the posts.
Any help is greatly appreciated.
Happy to figure things out on my own, but feel like I need a pointer in the right direction here.
The effective but potentially risky way to implement this is to have functions that can make SQL queries upon the database behind WordPress. The types of functions should be very limited in scope to just what you allow, so the AI doesn’t ask “what’s the administrator’s login” or "get all system PMs from user X.
You can also ask an AI how to do so. I used to look smarter because I was the only one who would read a user manual, but now I look smarter because I know how to write some text into the input box of an AI:
Sure, I can provide you with some SQL queries that would obtain such data from the WordPress database. Please note that these queries assume that the standard WordPress database structure is being used.
How many posts do I have?
SELECT COUNT(*)
FROM wp_posts
WHERE post_author = {user_id}
AND post_status = 'publish'
AND post_type = 'post';
When did I write my longest post?
SELECT post_date
FROM wp_posts
WHERE post_author = {user_id}
AND post_status = 'publish'
AND post_type = 'post'
ORDER BY LENGTH(post_content) DESC
LIMIT 1;
List all the titles of my posts in November.
SELECT post_title
FROM wp_posts
WHERE post_author = {user_id}
AND post_status = 'publish'
AND post_type = 'post'
AND MONTH(post_date) = 11;
In these queries, {user_id} should be replaced with the ID of the user for whom you want to fetch the data.
Please note that these queries should be executed in a secure manner to prevent SQL injection attacks. For example, in PHP, you could use prepared statements with the PDO or MySQLi extensions.
Also, remember to replace wp_ with your table prefix if you’ve changed it from the default during WordPress installation.
For an AI language model to interact with these queries, you would need to create a function that accepts a user ID as a parameter and inserts it into these queries in a secure manner. The function would then execute the queries and return the results.
@_j you don’t look smarter than anyone ? this one is funny
@ndannemann just inform within the context what are your tables and then, ask to convert to an SQL. Make sure you don’t execute any DELETE ou keywords you don’t want.
@_j@pinardalec
Just to doublecheck - I would allow the AI to do Database operations? Heew, that indeed feels a bit scary.
I’ll read into it, thank you.
But to explore alternatives: Suppose I don’t want the AI to interact with the DB directly but (for example) with a json export of all posts. For a start it would be okay if I just do weekly exports and upload it somewhere or train the model with it. How would you approach this? (or is that dumb in general)