Hi Kaey!
You can always try running a local LLM (Large Language Model) and experiment by yourself.
Please have in mind that the local installation is quite different from the online version of OpenAI GPT.
Before you begin, ensure that you have the following:
Python 3.7 or later.
OpenAI API client installed. (Instructions below.)
An OpenAI API key. (Sign up on the OpenAI website)
All the required Python libraries (Instructions below.)
First of all: create your environment - install Python and Visual Studio.
Create a folder (any location) and name it “Local_GPT” (Or whatever suits you).
Launch VS Code (Visual Studio Code), and then in the File menu select “Open Folder”.
Choose the folder you just created (“Local_GPT”) and click “Select Folder”.
If terminal is not already visible at the bottom, click the View menu and select “Terminal”.
Type: pip install openai
Install the required Python libraries:
pip install requests numpy tqdm
I don’t know what kind of GPU you have, as the size of your VRAM is the limit of your LLM (Large Language Model). I use a PNY Nvidia A100 HBM2 with 80 GB, but that’s really overkill for this task and is really too expensive (~$23,000).
You can buy a cheaper model like K80 ($100) or M10 ($200) or better: Palit Nvidia RTX 3060 dual 12 GB ($340), which will make things a lot easier.
(If you’re going to use a Tesla, don’t forget to download all the necessary drivers, SDKs, etc. - some which requires you to register a user - so, my advice is to go with a “normal” GPU like Palit RTX 3060 12GB - cooling down a Tesla unit is, I apologize for my language, a pain in the ass.)
(Edited: I meant 8 GB and not 4)
So, the next steps depend on the size of your VRAM:
If you’re using a GPU with 8 GB VRAM, type this in the terminal:
python download-model.py decapoda-research/llama-7b-hf
This will download the base model with 7 billion parameters.
If you’re using a GPU with at least 10 GB VRAM, type this in the terminal:
python download-model.py decapoda-research/llama-13b-hf
This will download the base model with 13 billion parameters.
Now, run your local LLM by typing this in terminal:
python server.py –gptq-bits 4 –model llama-7b
(or llama-13b if you downloaded the larger model).
You should see an IP-address (local - depending on your network) - type it into your web browser to interact with your local GPT. The performance will depend on the power of your GPU and your RAM, CPU, BIOS configuration, etc.
If you have a dual (or more) GPU setup, try to find “Crosslink Control Override” in your BIOS Setup.
You should find it in settings for the PCIe sockets - but it’s very unusual in consumer grade motherboards - it’s not directly related to the traditional concept of linking two GPUs for increased graphical performance for gaming. This setting affect PCIe lanes configurations and how they communicate, optimizing how two GPUs work in parallel for compute tasks and is beneficial in setups where you’re doing heavy computational work that can leverage multiple GPUs independently, and therefore often found in BIOS of workstation motherboards. (If you have this option, you can use two 8 GB GPUs for the larger model.) EDIT: Changed 4 GB to 8 GB.
As you can see in one of my set ups, with two Tesla units, they’re outside the box, connected with PCIe x16 riser cables, and yet it’s not unusual that they get really hot (~90°) under heavy workload, despite I’m using 3 fans.
This one requires 128 GB RAM and dual Xeon CPUs.
For a better cooling and performance, buy some cheap servers off eBay:
(And don’t be stupid like me, and stack the servers on top of each other - buy a server rack and always keep an eye on the temperature, which should never go over 26-28°.)
Now, you have the basic setup to create several GPTs, by creating a VM for each GPT. Make sure all servers are configured to work as a single unit, using vCenter, vSphere and vSAN. Upgrade every component, but remember that your GPTs power are dependent on the GPUs. (Servers usually don’t have power connectors for GPUs, which requires some DIY-work and soldering and modifying the power supplies.)
Use OpenAI ChatGPT 4 to learn about how to create your own LLMs, “fuse” GPTs, and ultimately create a “super GPT” or “GPT phoenixes”.
I use my setup for creating and rendering animations and videos, which requires a lot of power and electricity. I don’t know if it’s dumb luck, but when I bought my apartment, I also got a flat-rate price for my electricity, as there are solar panels on our roofs (no one told me about it!). Just as with the temperature, keep an eye on how much electricity you use, or you’re going to ruin your economy.
If you encounter errors or issues, try troubleshooting by checking for error messages in the terminal, restarting the terminal, or repeating the installation.
You can customize the responses, fine-tune your LLMs with your own data, and tweak and change the source code, so you can create your “super GPTs”.
Good luck, and keep us posted on your progress.