Getting Frustrated - starting to feel OpenAI just isn't usable

I am in the same boat as you. Everything working great. I pop in functions. Then things are BAAD. I would think it is functions which is creating the problem, the model itself may be okay. I am thinking of reverting to the default completion endpoint instead of the chat completion end point, even though the chat completion end point is 1/10th the price.

Can you use different models so use the one that works when needed and use the other for functions.

You could ask for 3 different replies in the response and then use the Ada model or some other checks like using code to check the length to decide which one to use.

Have you got an example of a prompt that has stopped working?

Try using chat tools like ChatOpenAI, maybe it will better for you.

I like your thinking. Separating concerns is a programming principle that I believe supports higher quality responses, allows for focused evals, and helps construct precise & accurate prompt/response pairs.

Allowing GPT to explicitly reason out it’s answer first also tends to lead to higher quality reasons I’m my experience.

I’m seeing GPT shift from q conversational agent into a single-responsibility & purpose reasoning engine. It may lose context quick, but it’s answers are much more consistent and accurate

I wonder, does it make more sense to separate these concerns into a graph-based Jupyter-like structure? Genuinely curious. Would love some thoughts from all walks

OP touches on a very important Product issue that hopefully OpenAI has the eyes open to.

It’s called foundational models → developers don’t want the foundations shaken / deprecated. If you build a living on spending months on tuning prompts, you can’t force the transition to different foundations, especially if the new foundations don’t allow your product to perform as well anymore.

OpenAI needs to seriously evaluate the product strategy and the promise to whomever wants to build a living on their product. Understand there are plenty of reasons to ‘expire’ models, but consider the relationship to developers and the programming model (enriching vs expiring) going forward.

1 Like

It would be great and also achievable if AI was a database technology, or a communications standard, but it’s the most revolutionary technology in human history and it’s changing at break neck speed, I agree that it makes life hard for devs, but the opportunities are also gigantic.

It would be great to leave a big well serviced legacy model chain for those same devs to use, but… every GPU able to run an LLM is currently plugged into a server rack, there just is no spare compute to run a big legacy chain.

All of these things will get better with time and performance boosts but you can be sure of a bumpy ride for at least the next 12 months maybe 24.

A database can be an AI. Let’s say you use MS SQL and add some CLR Stored Procedures which use models or call APIs. The database can become a living thing haha.

Getting (acceptably) consistent results from gpt-3.5-turbo is difficult, there can be no question about that. However, it is possible. Spend some time reading about Chain of Thought prompting. While much of the material about CoT suggests it is mainly useful for getting LLMs to perform complex reasoning, it is just as useful for getting more consistent results from prompts that wouldn’t seem to require it.

Most importantly, you must not think of the LLM as intelligent. It doesn’t think, not like a person does. Rather than being frustrated that your results one day show no similarity to yesterday’s results, instead take that as evidence that what you’ve asked it to do is something it cannot do (as convincing as the response may have seemed, it was essentially a lucky guess), and then change what you’re asking it to do (not just a tweak, but a complete rethink of how it can help you solve your problem.) As an example, it is often better to make multiple API calls, passing in the results of prior calls as messages, rather than trying to get it to do everything in a single call.

4 Likes

It’s like trying to teach a young adult to do a task but the poor guy has to take care of everything.

Use Domain Driven Prompting.

Each prompt takes a state of work and adds a tiny bit of value.
Although it’s cheaper to give each prompt multiple tasks but the result has lower quality.

3 Likes

I’ve seen your post in an email from Open AI and I wanted to let you know I’ve played around with a Skyrim mod where all NPCs would be augmented by GPT3.5 and I’ve given up on it for a number of reasons. a) the models are all temporary for now, there’s no “fixed” model being offered and as you’ve seen, each new version responds differently. b) it’s too censorious for games (not saying it’s wrong but that’s just how it is, at some point the conversation goes on into the filter) c) the AI is just aweful at being “fun” it’s been trained to be an assistant. The tech itself is fine for it, but someone needs to make an NPC GPT model and then we’ll be able to use that in a more stable way. It’s coming, you’re early. My advice, save your code and move on.

Ive experienced the same kind of issues, all of the newer 0613 models act wayyy too weirdly formal and dumb, especially the function calling one, it’s literally useless because it never listens to me, I tell it not to do a task that i have a function defined for and it does it to tell me “Im sorry Ill stop doing that” or Ill tell it to stop addressing me by name and it will say “Im sorry Slipstream(my username)…” and the enum in a function seems to be useless. They also don’t follow previous messages closely as an example for behavior, in fact, it seems like they straight up ignore them most of the time, being completely unable to understand who someone is, and how theyre different from someone else, completely unable to act in a consistent manner. imo the newest models are actually useless.

you remember - I told you the same. I am frustrated as well. The LLM changes its behavior unpredictable. Like that I cant start production.

1 Like

我也是觉得阻丧,本来一直关注在追求的,都连端口都很难想办法对接上,我是真心的想试用一次,但是想都困难!。

端口难接,求助,有没有人知道信道怎么再和这个网址重新链接成功的办法!?。

This is great. It sounds like a possible solution. Probably could have used it here: Gpt-3.5-turbo-16k api not reading context documents - #2 by SomebodySysop

But, for those of us trying to develop knowledge apps for end-users, non-AI tech folks, how do we teach them these concepts as they try to transition from keyword to semantic search? And won’t they say, “Man, if I have to ask 20 questions before I can get an answer, I’m better off keyword searching!”?

for those of us trying to develop knowledge apps for end-users, non-AI tech folks, how do we teach them these concepts as they try to transition from keyword to semantic search?

There are tons of people building GPT powered knowledge bases, so I suspect it won’t be hard to find decent examples if you search around a bit. In general, you’ll want to insulate the end user from the AI prompt. They’re giving you information about what they want to find, it’s your app’s job to apply that to a functional AI prompt (don’t expect the end user to understand any of the nuance.)

We’ve developed multiple different uses for generative AI in our products. Each one is unique, so each takes a different approach depending on what we need to accomplish. In some cases we use multiple calls to get additional information from each call, allowing us to build up complex responses that wouldn’t otherwise be consistent/reliable. In other cases we make a call to get a response and pull out a piece of that response to put into a concrete data structure (ensuring the AI can’t mess up other data in that structure.) The list goes on, but the point is that this takes work - there is no one size fits all approach to leveraging LLMs for real applications.

One of the simplest is to present the user with voice input (whisper api to text). When you ask a user for voice as input or offer voice as an option they usually ask in a more conversational way. But, even if they stick to keywords it should still be as good as any past method.

I don’t know. That may be part of the problem, because everybody is also mimicking the ChatGPT interface. I know it sounds crazy, but I’m thinking part of our job may be to teach people how to use this tool more effectively. I mean, there may be tons of people providing these apps, but there are also tons of frustrated users trying to use these apps with no tools other than being told “you need to provide better questions”.

I’m probably out of my depth here, but I think a huge part of the AI revolution will be the the revolution of Mankind learning to effectively communicate, conversationally, with The Machine.

a huge part of the AI revolution will be the the revolution of Mankind learning to effectively communicate, conversationally, with The Machine

Definitely not. People will need to learn how to do things (not just their jobs, but also basic tasks like composing a message, writing a note, getting directions, completing DIY tasks, etc) leveraging AI, but few people except app developers will become expert at interacting specifically with GPT.

I think you’re imagining that if someone gets good at talking to GPT that means they’ll be good at talking to any AI. That’s very much not the case. Every model is unique and has unique behaviors that need to be worked with and around. That’s precisely why we cannot expect end users to figure out how to work with GPT directly, this is only one model and every other model is different in meaningful ways. For that matter, being good at prompting GPT Turbo will probably not translate directly to prompting Davinci. Both are OpenAI produced models using similar engines, but they were trained very differently and do not respond the same way to equivalent prompts.

If you care about providing a good UX that doesn’t frustrate your users, then you need to abstract away any need to understand the underlying model. For what it’s worth, just because an app offers a chat interface for the AI, don’t assume that means they aren’t injecting a ton of logic into the interaction. This demo from Khan Academy has some great examples where they clearly (to anyone who understands what the model can and can’t do on its own) are doing a lot to guide the responses.

Logical. Makes sense. But, I disagree.

And I am putting my money where my mouth is. I have created an interface where the end user can decide how many documents to return, whether to use standalone question or not, whether to submit question or question concept to retrieve vectors, whether to use “hybrid” search techniques or not. And, providing the documentation to explain what the choices are and why they need them and when to use them.

This also requires explaining, at least superficially, the chat completion process.

So, at least in this case, I’m going to let the market decide if training users is a good idea or not.

We shall see.