Uncensored AI for sensitive topics

thepowerlaw · August 9, 2023, 1:05am

Does anyone know where I can find an uncensored LLM API that can discuss sensitive topics like human anatomy? How are those “AI Girlfriend” apps doing it - what are they using?

Is there a way to get OpenAI’s API with uncensored content or are there other services I could use? I looked into hosting something myself but it would cost thousands of dollars per month minimum.

Foxalabs · August 9, 2023, 1:11am

Hi,

It’s not a service OpenAI offers, I imagine those other systems are using a large 75-90 billion parameter llama derivative or perhaps one of the other base models running enterprise class servers and GPU’s. No matter which way you cut it, it’s going to be an expensive task that will require significant investment.

bruce.dambrosio · August 9, 2023, 1:25am

Hmm. I can run llama-2-70B in 4 bit on a dual 3090 server I put together for < $3000 total.

But of course the model server is single threaded. Maybe you need higher volume than that.

Another thought - how about the microsoft or google med-specialized models? Maybe you need an md to get access?

thepowerlaw · August 9, 2023, 1:50am

I’m fine with Llama 2 but what is a good place to get a hosted service running? With Hugging Face it is over $3,000/month

curt.kennedy · August 9, 2023, 1:55am

I can run llama-2-70B at 8 bit on my Mac Studio using only 2 CPU’s. I get like 0.3 tokens per second output, but it runs. (OK, maybe not that slow, but it runs slightly faster than I can type.)

So let this inspire you to run on your current hardware, and proper GPU’s only speed this up by orders of magnitude.

Just depends on your latency requirements.

bruce.dambrosio · August 9, 2023, 2:06am

yup, that’s why I ended up building my own server. You can’t get a gpu that can host 70B at under a couple $k/mo, and if you find such a place, they turn out to have no availability.
OpenAI looks like a bargain if you can live within their guardrails!

bobartig · August 9, 2023, 3:23am

I can run llama-2-70B at 8 bit on my Mac Studio using only 2 CPU’s. I get like 0.3 tokens per second output, but it runs. (OK, maybe not that slow, but it runs slightly faster than I can type.)

Do you mean on 2 cores, out of the 12 or 24 it has? Does that mean you are still using the machine for other things as well? Could yet get better token/sec performance or maybe run it at higher bits with more cores? I had no idea you could run an LLM on just a portion of the CPU of a [high end] desktop computer.

curt.kennedy · August 9, 2023, 4:42am

Yeah, I had it running on 2 cores out of 16. Still able to run other programs no problem. Sometimes just let it run in the background. You can configure it to run on more cores for speed, or offload to GPU’s as well. It’s totally customizable.

I tried building the C++ for Metal support, to tap into the 48 core GPU, but no luck. Build always fails for some reason. But the framework I used (llama.cpp) gets updated all the time.

Also, while running, it only took 2.5 Gigs of memory (out of 128 Gigs). This is an M1 Mac Studio from a year or two ago.

creamySandra · August 9, 2023, 6:49am

have a look into the lore around tinybox https://tinygrad.org/

anon5861895 · August 9, 2023, 8:21am

I saw this recommendation in a Reddit post. Look them up. They’re not difficult to find.

Blockquote
Here are the 2 best uncensored ChatGPT alternatives I am using in production:
NLP Cloud (their Dolphin and Fine-tuned GPT-NeoX models)
AI21 (their Jurassic 2 model)

I haven’t tried any of those, so I can’t testify. Both seem legit and production-ready, I think.

bobartig · August 9, 2023, 3:48pm

Didn’t realize I had a use case for a Mac Studio . I’ve felt silly for years getting 15" macbooks b/c I really prefer the bigger screen, but the hardware is way more than I need.

But now I’m think “hmm… run open models at home. ”

curt.kennedy · August 9, 2023, 4:48pm

Also, from a learning + ops perspective check out

And check out the forks in Golang, Python, JavaScript, Rust, Julia, etc. and of course C++.

Topic		Replies	Views
Gpt-3.5-turbo-0613 refusing generations for NSFW content API chatgpt	11	38814	December 13, 2023
Open Source is making rapid progress Community agi	21	2102	July 24, 2024
Data Privacy and limitations Community privacy	8	2927	December 16, 2023
Will I be charged for failed ChatCompletion requests? API gpt-35-turbo	5	2605	May 16, 2023
Host openai models locally - token limit API	1	6648	February 20, 2023

Uncensored AI for sensitive topics

Related topics