I want to develop a hands-free AI assistant/ companion for my Mom

Hi everyone! I would really appreciate any advice you might have for a project I’m working on for my Mom. She recently had a health event that left her with only partial vision. Before this, she always used technology for everyday tasks in managing her small business. (scheduling, writing documents ect…) She’s scared she will no longer be able to do this, so I’d love to develop the perfect assistant / voice companion for her.

The requirements would be:

  1. I would like to use a GPT4 voice model, that I can train or fine-tune on her current data.
  2. I would like the assistant to be able to retrieve this data at anytime, as well as inject new data for future retrieval. (Vector Database?)
  3. I would like to train a custom persona for this bot, so it is an everyday companion ai capable of searching the web, generating DALLE 3 images ect.

I will be running this off a touch screen android based tablet. So my question is; am I on the right track, to start with a GPT4 voice model, Fine-tune this model so it is suitable as her personal companion as well as assistant. Finally, I will integrate with a vector database?

Or would I be better off to simply use an app like BotRush with my own API key?

7 Likes

Hey James

Sorry to hear about your mom’s situation.

But we are indeed living in the times where many of the science fiction we used to read in books and watch in movies is becoming reality and helping people like your mom continue living a full life.

To your questions:

  1. You won’t need to fine-tune voice model (at least from your description I didn’t see a need for that).
  2. Fine-tuning language model may make sense, but not sure you’ll need that in the first version. It is a much more complex task and in many cases just not needed.
  3. Storing and retrieving the knowledge may indeed be done with a mix of vector and relational databases.

Do not know about BotRush, but generally if you build something, the more high-level tools you use, the less flexible they are (unless they are extremely niche and made for very narrow case).

But such high-level tools may be a great and quick start.

2 Likes

Thanks Tony! I appreciate your reply. I am a relative newcomer to AI, so I might be misunderstanding the fine-tune, or “training” part. What I am looking to do, is similar to the custom instructions feature on the chatGPT chat bot, except I would want the Persona to have a larger variety of set responses, and to know a lot more knowledge about by Mom, then the 1500 character limit the bot has. I would like to make this model not only very helpful for her, but entertaining and “real” feeling to her as possible.

1 Like

That sounds like a really cool project, and at the same time, it’s something that has surely been done already in many different ways, as you point out in your original post. Depending on how quickly you want to see results, it may be sufficient to search for and test existing solutions. From this perspective, you can also start to explore the custom GPT features and see how far this takes you. I recommend this approach as it makes sense before delving deeper into using assistants and the API.

Having said that, I believe it’s not an issue to be new to AI in general, but it could be problematic if you need to acquire several additional skills as well. For instance, your solution might be a web app which you then have to secure, create a UI for, and set up the AI assistant alongside. This web app can then be accessed using a tablet. If you decide to develop a native app, the considerations remain the same.

Regarding personalization, you can also input the personalization prompt directly via a user message instead of using restricted custom instructions. However, any further suggestions would now require more specifics to proceed.

Fine-tuning is very likely not necessary at all. There’s a common misconception that ‘tuning’ affects performance differently than it actually does when working with Large Language Models (LLMs) like GPT-3.5.

3 Likes

I believe you can look into assistants:
https://platform.openai.com/docs/api-reference/assistants

It is possible to try it on playground too:
https://platform.openai.com/playground?mode=assistant

The advantages would be that you are allowed to include a file attachment, possibly customizing a larger amount of personal information.

Another easier way too, is to create a custom GPT:
https://chat.openai.com/gpts/editor
You can add files the same way as in the assistants, but you don’t need to be a programmer for that.
Once you customize your GPT, your mom could use the voice chat on the mobile app to have a personal assistant that already has a voice, dalle, internet and everything.

Based on your requirements it sounds it sounds like you ultimately want a “Hybrid RAG”.

I say RAG because you want to periodically inject new data easily. And hybrid because you want the output to be fine-tuned.

I would first drop the fine-tune part, because you can get close via prompting, and focus on the RAG and overall setup.

To get the minimum viable product (MVP) going, the voice in/out could be accomplished under the accessibility settings on the OS. On Macs this is Voice Control and Dictation.

The presentation layer could be something simple like a free version of Slack, with your RAG backend communicating to it via Slack API.

The RAG part could be done under the free tier of most cloud services. For example, using AWS, your vector search is on a binary version of the vectors, loaded from S3 into a Lambda function. Once the chosen closest vectors are found, the corresponding hash is used to look up the text in a DynamoDB table. You could even cut out the DynamoDB table and have it all in the binary data, including text and vectors, but you’d use more memory in the Lambda with the accompanying text. Most of this should fall under the free tier, or extremely cheap.

So that’s it. Once that get’s going, you can create the bells and whistles app version, but this gets you a free-ish MVP to start.

To add in the hybrid, you fine-tune on prompt completion pairs that have neutral text input, and stylized text output. You then switch your prompted version over to this new fine-tune version when the fine-tune becomes available.

The only thing left then, is to build the app with stylized voice output, and some presentation layer. I assume you would use the device OS to generate the voice to text input. Unless you want to do this in the cloud … so upload each audio file and have the cloud transcribe it. This would give you more control, but add latency and increase the uplink bandwidth (similar comments for voice output). So maybe the app should also start out with OS driven TTS and Voice control.

3 Likes

James, it is a common point of confusion. Fine-tuning is a term meaning very specific technical task updating the weights of the model, but in the lingo of clients it somehow transformed into “setting up behaviour of a model”. From what I see, in 99% of the cases when somebody new to the field talks about fine-tuning, they are actually talking about prompt engineering (or - prompting). As @curt.kennedy mentioned above, prompting will do the job for you.

awesome. Thank you for all for your knowledge! The thought of making a custom GPT never crossed my mind for this use case until aprendendo mentioned it. I’m going to start with that route, and see if the actions/ knowledge abilities of the CustomGPT can accomplish what I 'm looking for. Wish me luck! :metal:

1 Like

You don’t need luck man! Just some time and effort.

1 Like

Also consider a Knowledge graph such as Neo4J there versatility and accuracy is underrated with RAG and LLMs.

1 Like

As someone who cares for his elderly mother and profoundly intellectually disabled sister, this use-case just grabs me. If you need any assistance, please reach out. Pro Bono, of course. My experimentation with providing GPT-3 and GPT-4 with a continuous memory, and the python module I’m working on (GPT-HLLAPI) sound like some of what you might need. If you’re looking to host the interface on a tablet, why not use an AWS ec2 instance or just a handful of lambda functions? My thoughts on this:

  • continuous memory facilitated via neo4j graph database (to allow it to remember her name, learn her preferences, anticipate her needs and search its memory as needed)
  • self-evaluation and self-prompting functions
  • IoT integration if you want to enable her to do things around the house with conversational commands (open shades, adjust thermostat, etc)
  • secure automation software with an exposed flask server on her computer and custom functions implemented via pyautogui locally, so that the AI can issue a command (i.e., compose_email(email_contents_as_python_dict)), and the automation script will carry out the actual tasks of opening the email service in a browser, click compose, and then provide screenshots as feedback to GPT to go from there). As for using GPT voice, you could get away with just doing asr with whisper or using another transformer-based asr solution. That way, the whole solution is in the cloud, and the Kotlin/Java/C++ (whichever you are going to use) android app can be lightweight and just basically serve the function of making api calls.
4 Likes