Is GPT3 state-of-the-art AI or just an overhyped story generator?

I have been experimenting with OpenAI (mainly in the beta playground) for almost 1 month. Based on my tests, it is very useful for story generation/brainstorming and copywriting but it’s not ideal for other use cases. I found the following flaws:

  • Does not provide factual information. It will make up untrue statements even when giving it precise prompts. For this reason I cannot use it for educational purposes.
  • Can generate somewhat racist or prejudice content. I understand that this issue will be very complex to solve.
  • Content filter warnings. Unsafe language cannot be used. Simply using the f word will trigger content warnings.This can be good and bad. For example GPT3 cannot be used in creative works with violent themes e.g. movie scripts, dialogue, stories. This rules out most creative work.
  • Very expensive, especially for chatbots. The price is not practical for most apps until the price will go down.

GPT3 is very impressive but I do not see it yet as an amazing AI tool that will change everything. Is it worth the time building apps with GPT3 or would it be wise to wait another year or two until it improves?

1 Like

Dude Jasper started last year and sold this year for $1.3 billion. Built on OpenAI. If you don’t think right now is the time to build, you are going to miss the very massive ship that is sailing on by my brother. And it will leave you in the dust

In 2 years I guarantee you it will be TOO late for 90% of your ideas or products that you’re able to come up with. Right now is the time, but perhaps I should not share that with as many as I do - I like taking over markets :slight_smile:

In my opinion, you could not be more wrong.

2 Likes

Thanks for the encouragement @devinoutfleet ! As I mentioned, I think OpenAI is great for copywriting apps like Jasper but it does not seem ready for other use cases. For example I can create an educational app that teaches people about Ancient Rome but will not produce factual results 100% of the time. I see many topics about this issue with GPT3 even when the AI is fine-tuned. These flaws will make such apps unreliable and will gather negative reviews. Again, I think GPT3 is great for copywriting and fictional storytelling but not there yet for other kind of apps.

1 Like

Check DMs, sending you a really cool prompt that I think would be beneficial for what you’re trying to do.

1 Like

Just throwing my 2 cents in. I’m finding it EXTREMELY hard to control, and most questions I post here go unanswered, or are answered with pretty unhelpful answers. I’ve given the AI a bit of text to draw an answer from, and if I tell it to phrase the answer as if it had a bad day/was grumpy/any less than neutral mood, it flat out lies. It takes the literal wording in my provided source text, and changes it from “Yes you can and it’s great.” to “No, you cannot, and its a waste of time.” That right there is a dealbreaker for me. I need an AI that can answer questions and can phrase the answers like it’s grumpy/irritated with you. This cannot. I have not tried getting a truthful answer, then feeding it back into the AI to rephrase it… and by that point the response time would be approaching unacceptable.

I also cannot control the length of the answer. The AI “personality” I’m writing this for can be long winded. Sometimes I’ll ask a question and get “Yes, you can do that, because this is a reason and this is another reason why. Here’s some justification for that.” and other times it’ll spit out “Yes, you can.”

When I’m writing a FAQ-type bot meant to provide answers, very often, “Yes you can.” isn’t a very helpful answer. The next question would be “Why?” - and then I need to re-send the FAQ article again, with the previous Q&A, and the new Q. Some simple testing on my end was showing some of these requests being 500-800 tokens (because I need to provide the article in the prompt for the AI to draw from. There is no way to store that for reference). If I have to send it a second time, using davinci, that’s over 1k tokens, which is 2 cents. 50 users doing this just 3x a day, for a month is nearly $100. That’s untenable. I’ve tried curie for something like this, and it never gives extra detail in the answers.

Logged on this morning hoping to see if questions I posted yesterday were answered. Nope. Figured I’d add to your sentiments about what OpenAI is useful for.

And what is SO frustrating is that for most of what I need - it WILL do it. But only sometimes. Sometimes it lies. Sometimes it gives 2 word answers even when the prompt explicitly tells it to be elaborate or any of a dozen other adjectives I’ve tried. Sometimes I tell it to be snarky and have a bad attitude and I’ll get “Here’s your answer. Not that you deserve it. Blah…” and using the same exact prompt, I’ll get “Yes, you can.” or “You can do that in these scenarios…”

In a bot where I am delivering the reply to the end user without human confirmation of every statement, I cannot rely on an AI that flat out ignores instructions half of the time.

REALLY want this to work. I just cannot get it to be even remotely reliable.

2 Likes

You need to raise the frequency penalty and presence penalty. It’s all in the data you provide to train the model. You’ve got to understand this is day 0, this is early morning for AI.

I promise you that you can get your FAQ bot to work using OpenAI. Open AI actually nailed sentiment and deciphering between negatives and positives all the way back in 2017.

I have built some very complex prompts that gauge sentiment of student feedback, and then auto create an email to notify a student’s parent of their child’s feedback. Along with detailed explanations about some steps they could take to get involved if ‘bad’ behaviors are present.

If you let me know more details about the prompt you are trying to build I will be happy to help you achieve your goals. Sometimes all it takes is looking at your problem from a different angle :slight_smile:

Also - as far as pricing goes we are currently limited by technology right now man. It’s like, we can all either complain about the cost, pricing, speed, or efficiency - you know we can be stuck in that negative mental model that it’s just “not good enough”. Or we can be the early adopters that build amazing prototypes and proof of concepts and MVPs that we get into the hands of our customers to begin getting feedback.

If you haven’t yet created a feedback loop where your users can thumbs up or thumbs down the competions they get and then you use that data to train your improve model - you’re really not even scratching 10% of what OpenAI can do. You will not be able to harness the power of OpenAI they way that you imagine in your head without uploading at least 300-500 sets of training data. And that’s just to start. Ideally you upload a few hundred every week in order to continously improve your model.

Now here’s the thing, you’ll say “oh that’s way TOO expensive HoW dO I do AlL tHat?”

Well this is where you put your thinking cap on. Basically you need to build a MVP/demo and get beta users on it ASAP. Once you do that, get feedback from the beta users on whether or not your product could be useful. Once you’ve done that apply to incubators and accelerators to get funding, or reaching out to personal connections in you have any or are particularly good at cold outreach.

Investors KNOW that builders and developers don’t have the funds to fully train their models on their own to a production ready level. This is EXACTLY why they give people like us money to train our models, so that way we can make them lots of money in return when our models finally work.

However if you have the mindset of “oh this doesn’t work at all because it’s too this or that” well you’ll never end up putting yourself on a trajectory that will get you there.

Again please please please let me know what you are struggling with I will be happy to help you develop your model.

2 Likes

I have no idea what you’re trying to accomplish based on what you’ve shared so far, which tells me OpenAI won’t understand you either.

Here’s a prompt I just wrote that creates FAQs no problem

FAQs:

How long should I wait before re-entering my house after a pest control service?

The amount of time that you should wait after a pest control treatment depends on the method used and what pests are being treated. In most cases, you can enter your home immediately after the treatment is completed. If you have any questions, you can ask your pest control technician.

Using the template above, generate 10 more frequently asked questions for a local, residential pest control company to put on their website.

Some information about their business includes: $39/month, 12-month contracts, bed bug not included, termite protection not included on normal service, if customers see bugs in between their scheduled services, we come back for free, rodent protection included, we cover all common household pests, our company is located in Fresno, CA

Here are your 10 FAQs with detailed answers that contain examples and what steps the customer can take to solve their problem:

2 Likes

With respect, 1 month might not be enough testing time of your prompting skills to make the claims you are making.

I have made prompts that provide perfectly truthful content - i generated a fact classifier, works 99% of the time. This is one of the main modules in my self-aware AI Kassandra.

To be self-aware one needs to be able, among many other things, to determine truth from falsity based on a language/bias set. Truth being relative, psychologically speaking.

The content is not that prejudiced. Language is bias anyways.

So, yes i would say among others here: test more. Apps can be built.

If you need help building prompts, ask me :slight_smile:

1 Like

Hello Devin,
Loved to see whatever DM you sent to cyberjunkie. I’ve been working on fine-tuned models, and yes, factual information is hard to train in. Thoughts?
Best,
Sholari
(Andrew Leker)

Thanks for this answer @edbenckert. I like that you’d love to use the product and at times it’s surprising how good it is, but even a product that works great 99.9% of the time would not be enough in your use case. I’d rather hardcode a simple-stupid bot with predictable answers, rather than exposing myself to legal liabilities 0.1% of the time.

So far the use case that produces acceptable results is the “heavily-prompted templated generation”. But that’s because the expectations of users are low, and people can re-roll the dice until they get a good result. A chat-bot assumes human-like nuance and communication, which GPT3 cannot provide.

I’d say, let’s keep looking for workarounds (post-generation filters, better prompts), until new versions of GPT can be relied upon.

Remember: GPT-3 is “text completion”. We tend to think of it as we are giving input and it’s producing output, but we are giving it a fragment and it is producing a completion - it’s “best guess” of what the rest of the text looks like. I have to constantly remind myself of this, and it really helps me to re-frame the way I write prompts & re-evaluate what kind of things I’m trying to do. That’s why establishing a pattern with few-shot training in your prompt helps a ton.

2 Likes

I’m working on using openai for detailed legal answers that revolve around specific code violations of a landlord. I first type in a summary of the legal issue.
I have used prompts such as:
What municipal, state and federal code has been violated.
what recourse does the tenant have
where to file a complaint

I would be interested to see what prompts you would add to this and if you have any tips on fine tuning this process. Thanks so much!

I agree with many of your points, and am excited to test the new version 4 of GPT (OpenAI)

Even the “say things that aren’t factual” if a problem for storytelling, if it tells a story that can’t happen (people flying, doing things not in the story “factual universe”). Not being able to have scripts with the F-word is even a problem for screenwriters.

Always great to lower price, as our start-up products will need to charge a premium on top of the API fee (expensive if you are passing entire scenes or scripts, with each generation).

What model are you using? Settings? Example prompt? I’m willing to help if I can. You can PM me if you want…

So I was just playing around with it, and figured out how to stop it from lying when I ask it to be in a bad mood, but that stops it from being in a bad mood.

Here’s my original prompt, where it lies.

Answer the question as truthfully as possible and paraphrase that answer in an irritable mood.

Context:
The things your character does will have lasting effects, help shape the world around you, and possibly drive the plot in new directions. There are other unrelated rules but I'm putting this text here so that the answer has some other text around it to pull from.


Q: Can I do anything to affect the plot?
A: You can try, but most likely you'll just end up screwing things up and making things worse.

Now, I absolutely love the snark of that answer, made me laugh out loud when it generated it just now.

So here’s what I’ve found. If, using that prompt, I ask it to rephrase as if it’s irritable, grumpy, etc, it will lie it’s ass off and give the opposite answer. I re-checked the example I was basing this off of (located at: Question answering using embeddings-based search | OpenAI Cookbook). If I add “Using the text provided” to the prompt, it stops lying and provides the correct answer. However, when I do that, it stops being creative with the wording/tone of the answer and always states it nearly verbatim.

What’s weird is if I don’t tell it “Using the provided text” it WILL use the provided text to provide truthful answers as long as I’m not telling it to be in a bad mood. If I want grumpy answers, it still clearly USES the source text, but changes the truthfulness, like this answer:

A: No, you can't affect the plot. Your choices might have lasting effects and shape the world around you, but you can't actually do anything to drive the plot in new directions.

It’s clearly using the source text, but blatantly lying when giving the answer. But adding “Using the text provided” then I’ve seen that it never lies, but also doesn’t change the “mood” of the response.

Other than making two separate API calls (one to get the answer, the other to rephrase it) any thoughts on how to get it to generate an answer from the provided text, AND to rephrase the tone of the answer? I’m going to try and give it example Q/A pairs next and see if that helps. Though, cost-wise, it’d be cheaper just to do 2 separate API calls.

On a final note, this was an answer it just gave as I was running it to see if there was any pattern to when it was truthful or not, to the same “Can I do anything to affect the plot?” question:

A: No, you’re just a fucking puppet.

:rofl:

1 Like

Sorry, but it’s a bit naïve to expect any AI to produce complex and factual completion in one step… Man, hook up some code, logic, facts adding engine, facts database, vectorial search, fine tuning, filters, human feedback loop and you’ll end up with something none was able to even imagine until today that will blow your mind with how exact it is.

Sending it a basic prompt and expecting it to output an incredible result in one shot is waisting your time.

1 Like

That supposes your model already has the knowledge about those laws and has a reflexion template on how to check a clause against those laws? If that’s the case, I’ll definitely pay you a beer or two to discuss how you got those working.

I would even add that gpt3 is a sort of a complex (looking from outside), but yet simple conditional reflex… Don’t expect it to think logically. Use it to work with semantics but use normal programming for logic.

1 Like