Uncensored AI for sensitive topics

Does anyone know where I can find an uncensored LLM API that can discuss sensitive topics like human anatomy? How are those “AI Girlfriend” apps doing it - what are they using?

Is there a way to get OpenAI’s API with uncensored content or are there other services I could use? I looked into hosting something myself but it would cost thousands of dollars per month minimum.

2 Likes

Hi,

It’s not a service OpenAI offers, I imagine those other systems are using a large 75-90 billion parameter llama derivative or perhaps one of the other base models running enterprise class servers and GPU’s. No matter which way you cut it, it’s going to be an expensive task that will require significant investment.

1 Like

Hmm. I can run llama-2-70B in 4 bit on a dual 3090 server I put together for < $3000 total.

But of course the model server is single threaded. Maybe you need higher volume than that.

Another thought - how about the microsoft or google med-specialized models? Maybe you need an md to get access?

2 Likes

I’m fine with Llama 2 but what is a good place to get a hosted service running? With Hugging Face it is over $3,000/month

I can run llama-2-70B at 8 bit on my Mac Studio using only 2 CPU’s. I get like 0.3 tokens per second output, but it runs. (OK, maybe not that slow, but it runs slightly faster than I can type.)

So let this inspire you to run on your current hardware, and proper GPU’s only speed this up by orders of magnitude.

Just depends on your latency requirements.

2 Likes

yup, that’s why I ended up building my own server. You can’t get a gpu that can host 70B at under a couple $k/mo, and if you find such a place, they turn out to have no availability.
OpenAI looks like a bargain if you can live within their guardrails!

1 Like

I can run llama-2-70B at 8 bit on my Mac Studio using only 2 CPU’s. I get like 0.3 tokens per second output, but it runs. (OK, maybe not that slow, but it runs slightly faster than I can type.)

Do you mean on 2 cores, out of the 12 or 24 it has? Does that mean you are still using the machine for other things as well? Could yet get better token/sec performance or maybe run it at higher bits with more cores? I had no idea you could run an LLM on just a portion of the CPU of a [high end] desktop computer.

Yeah, I had it running on 2 cores out of 16. Still able to run other programs no problem. Sometimes just let it run in the background. You can configure it to run on more cores for speed, or offload to GPU’s as well. It’s totally customizable.

I tried building the C++ for Metal support, to tap into the 48 core GPU, but no luck. Build always fails for some reason. But the framework I used (llama.cpp) gets updated all the time.

Also, while running, it only took 2.5 Gigs of memory (out of 128 Gigs). This is an M1 Mac Studio from a year or two ago.

1 Like

have a look into the lore around tinybox https://tinygrad.org/

I saw this recommendation in a Reddit post. Look them up. They’re not difficult to find.

Blockquote
Here are the 2 best uncensored ChatGPT alternatives I am using in production:
NLP Cloud (their Dolphin and Fine-tuned GPT-NeoX models)
AI21 (their Jurassic 2 model)

I haven’t tried any of those, so I can’t testify. Both seem legit and production-ready, I think.

1 Like

Didn’t realize I had a use case for a Mac Studio :laughing::sweat_smile:. I’ve felt silly for years getting 15" macbooks b/c I really prefer the bigger screen, but the hardware is way more than I need.

But now I’m think “hmm… run open models at home. :thinking::money_mouth_face:

1 Like

Also, from a learning + ops perspective check out

And check out the forks in Golang, Python, JavaScript, Rust, Julia, etc. and of course C++.