How to use AI to make Wordpress Ai bot

Hello all,

I have a question about how to effectively create an AI bot. By the way, I have minimal knowledge of coding but have used ChatGPT 3.5 to effectively code scripts for automating Excel functions. I have been tasked with finding a way to get an AI bot to answer questions and comments on certain WordPress websites. I’m not sure if there are already bot plugins that can be utilized, but I thought of using the OpenAI API to create my own. While I have asked ChatGPT many questions on this subject, I wanted to seek some external assistance.

Basically, I need to develop a bot that can monitor comments coming in on multiple websites and respond to them based on trained data derived from articles on the various websites that will use it. This will alter how ChatGPT typically answers questions, as it will use articles to formulate responses or quote from them. To make things more complex, it will need to operate in multiple languages. I am aware that ChatGPT has varying levels of proficiency in languages, but I will utilize it for the languages it excels in. I hope this clarifies my goal. To be clear, I will not be coding from scratch; I will be using GPT to do most of the work, and I’ll ask it to change the code if it makes mistakes.

ChatGPT provided me with a few steps to follow: get an API key, choose the language you want to code in, train it on comments, and provide it with data to learn from. Use a virtual environment for training until it becomes accurate enough, then launch and monitor it. As a side note, ChatGPT suggested creating a centralized AI bot to handle all the websites in different languages. Is that a good idea?

The part I am confused about is how I will input the numerous articles I want the bot to base its answers on. GPT suggested using an external data storage, but I am unsure how the bot would access that.

I know this is quite a lot, but any help would be greatly appreciated.

1 Like

Hi and welcome to the Developer Forum!

If you wish the responses to be based on existing information then you could embed the data with an embeddings model like ada-002 and then use a vector database to store that data. Then when come to generate a reply from the GPT model you use the source question as a “search” term and look for similar embeddings in your database, you then use those as context for the API call to one of the GPT models. That’s the basics of it, also called “Retrieval Augmented Generation”.

I am kinda a novice at this sort of stuff. But does the embedded data become numerical as i understand? As well would this data then be in the code or where would it be stored?

The concept of embeddings is deceptively simple on the surface and goes into higher dimensional mathematics under the surface, suffice to say that your text gets turned into a set of numbers which represent the location of the sentiment and semantic meaning of the text being encoded (each “chunk” can be up to 8k tokens ~ 6,000 words long) each “chunk” is then stored in a “Vector Database” and then a distance can be calculated between and given input text and all of the stored chunks in the database, the closest matching chunks can then be retrieved, which typically gives you a links back to the plain text used to originally create the chunk. It is often given the name of “semantic search” as it does not look for words that match, but rather meanings of the words. I’ve been doing this sort of thing for about 6 years and it’s still a single step away from being pure wizardry and magic.

So, as you can imagine there are more than a few “Vector Database” providers now, some are open source and free, others are commercial and ready to go with minimal setup, popular ones are ChromaDB, Weaviate, Pinecone, and lots of others.

so does open ai charge the amount of tokens it spits out or how many I want to embed in. Also would I need to train the bot on embed data or can it scan selected websites in order to answer effectively. EG if its a question on an article but the answer is related to a different article on the website do I need to have all articles embedded or can it just scan and give the answer from the other website?
Thanks for your help btw.

You are charged to embed the data initially and also to create a single embedding vector of each question, the costing is extremely low, $0.0002 per 1000 tokens currently.

Typically you would embed all of your data, even gigabytes of information will cost a few $, so it’s usually not worth not doing it all.