I loved this and can’t wait to play with all of it!
The dev day was just perfect.
Ran into this as well - nice callout.
Trying to run the assistant code but getting an attribute error saying that there is no beta attribute
Am I blind? I can see that messages are automatically managed, but I don’t see any information on the limit of active threads we’re allowed. I can see that we can delete them. Nothing more than that.
Answering my own question here with speculation. I am seeing a trend of “we will delete stuff for you, don’t worry”. I’m guessing this applies for threads as well.
Someone is wearing their fancy extra-pointy shoes today!
Please tell me you’re going to take this opportunity to angle for a more formal position with OpenAI!
Does anybody know what log probs are?
Yes, someone does know what log probs are. Good enough to ask a bot to save some time typing anyway:
Sure, let’s break it down.
1. What are Logprobs?
Logprobs, or logarithmic probabilities, are a way of representing probabilities in a more manageable form. Probabilities can range from 0 to 1, and when you’re dealing with very small probabilities, it can be easier to work with their logarithms instead.
2. Relation to Normal 0-1 Probability
To understand how logprobs relate to normal probabilities, you need to know that the logarithm is the inverse operation to exponentiation. Let’s say you have a probability p (which is a number between 0 and 1). The logprob is simply the logarithm (base e) of p.
To convert a logprob back into a regular probability, you would use the exponential function. If your logprob is a number x, then the corresponding probability is e^x.
3. Applications for Getting Logprob Values
Getting the logprob values along with an AI language model API response can be useful in a number of ways:
- Understanding Model Confidence: The logprobs can give you a sense of how confident the model is in its predictions. A higher logprob for a particular token means the model considers it more likely.
- Debugging and Analysis: Logprobs can be helpful for debugging and analyzing the model’s behavior. By examining the logprobs, you can gain insights into why the model made certain predictions.
- Custom Sampling Strategies: You can use the logprobs to implement your own sampling strategies. For example, you might want to bias the model towards certain types of predictions, and you can use the logprobs to do this.
Remember, while logprobs can be very useful, they also require a good understanding of probabilities and logarithms to interpret correctly.
Free for a little over a week. Let them rip boys!
$.20 / GB / Assistant / Day is pretty dope as well. Damn. I’m excited to stress test this bad boy with some data sheets.
I’m going to become a fantasy football god with this lol.
DALL-E 3 API is working now: A cute baby sea otter
Great photo but omg that url looks like a virus link
Also kind of crazy it has the organization and user ID in there. Doesn’t really matter, but dang.
It looks like it has trained on 512 x 512 photos. Pretty cool. I imagine for most labelling cases this would be perfectly suitable. And is only $0.85 / 1,000 photos!
Google Cloud Vision is $1.50!!!
openai.NotFoundError: Error code: 404 - {'error': {'message': 'Invalid URL (POST /v1/assistants)', 'type': 'invalid_request_error', 'param': None, 'code': None}}
Would this indicate it hasn’t rolled out to me yet?
Very excited about the new products especially assistant! I wonder is it possible to play with the demos?
need to be using openai==1.1.0 for beta
They are a representation of how likely a token was to be selected in that position…
So, if I prompt the model with:
Knock knock…
It will respond (most likely) with,
Who’s there?
When you get logprob
, in addition to the response, you’ll get a list of token lists with the log-likelihood of up to—I believe—the 5 or 6 most likely tokens for each position.
The message “Knock, knock…” is (weirdly) five tokens,
Text | Token |
---|---|
Kn |
37353 |
ock |
1197 |
, |
11 |
knock |
14459 |
... |
1131 |
The response “Who’s there?” is four tokens,
Text | Token |
---|---|
Who |
15546 |
's |
596 |
there |
1070 |
? |
30 |
I figured it would just be easier to show you with a real demonstration, so I did this in the Playground with text-davinci-003
:
The most likely token to follow the sequence, 37353, 1197, 11, 14459, 1131
is Who
: 15546
- Who = 99.17% (15546)
- who = 0.38% (14965)
- [ = 0.14% (9837)
- Who = 0.07% (?)
- Wh = 0.07% (1671)
Note: I’m not sure why Who
is there twice, maybe @_j can explain (they have done much more work with the log_probs
feature than I have, my suspicion is there is some non-display character in the second Who
. I’m not on my dev machine right now and don’t feel like going through the rigamarole it would entail to do an actual API call, sorry.
So, the model sampled randomly from this weighted distribution and (as it would be expected to do 99.17\% of the time) selected token 15546
(Who
).
At this point we forget everything we did before and resolve the next token in the sequence,
Kn |
ock |
, |
knock |
? |
Who |
---|---|---|---|---|---|
37353 |
1197 |
11 |
14459 |
1131 |
15546 |
And see that the most likely tokens are,
- 's = 90.32%
- bytes:\xe2\x80 = 9.63%
- is = 0.04%
- ' = 0.00%
- there = 0.00%
We then continue on,
The actual logprob
are the log of the probability, so for Who
at 99.17\% that’s,
Then, we can see what the log-likelihood is of an entire response or part of a response,
We use log-probabilities because[1] we can add them instead of multiplying them to get the total probability[2].
In this case, the first four tokens of the response have a total logprob
of -0.11 which is quite high[3]. We can reverse the process,
and see that we would expect that the first four tokens in response would represent—about 90\% of the time—“Who’s there?”
Now, the big question is, why would you care?
Two reasons:
- You can identify where a response is going wrong.
If you are trying to get the model to respond in a very particular way, it can be helpful to analyze the likelihood patterns around where it goes wrong. This might entail looking at thelogprob
on an errant token, adjusting your prompt or system message, resubmitting, and looking at thelogprob
of the errant token again. If it goes down, you know you’re on the right track. Also by examining the other most likely tokens, you can gain insights into what the model is “thinking” which can inform how you choose to try to steer the model. - The total
logprob
of the response can be used as a proxy for overall quality of the response (as long as it is normalized for token length). So, if you generate 100 responses, the response with the largestlogprob
is often considered to have a high-probability of being a high-quality response (though this is not always the case[4]).
That is my hasty (and likely grossly incomplete and over-simplified) description of what logprob
s are and why they are important from a development standpoint. I am sure in the far-too-long I spent drafting this, @_j (and possibly others) have already chimed in with much better explanations, but I took all this damn time writing this, I’m posting it!
mathematically ↩︎
it’s orders of magnitude faster for computers to add than to multiply ↩︎
0 represents a 100\% certainty, -\infty represents a total impossibility, and (for reference) \approx -0.3 is about a 50:50 chance of happening. ↩︎
For instance, the training data might, in a very narrow field, contain a large proportion of wrong information. This could cause the model to generate a poor quality response with very high probability.
Case in point: I was recently having the model through ChatGPT assist me with a coding project using the Python library PyPDF2 where the developers recently(-ish) did a very large, code-breaking change to their entire codebase (they changed there naming convention fromcamelCase
tosnake_case
). It was so mind bogglingly frustrating I had to fill both fields of the custom instructions with a comprehensive “PyPDF2 v1 to v2 Migration Summary” just to be able to get any meaningful work done.
The model was, with very-high-probability, generating code which would never work in my development environment. ↩︎
Interesting!
Need some deets about session timeouts, etc.
Also, it would be AMAZING if sessions could be saved and restored, then billed against storage costs while not in use.
I don’t think so. As long as the models don’t output “tracing copies” of inputs, then there’s no copyright question involved.
Artists have gone into museums, read art books, or copy-pasted magazines into reference books forever, and gotten inspiration from those precursors. And then the artists make something new, informed by what came before.
What comes out of AI models is no different. Generally, what artists are complaining about if you look deeply, is “it now requires a lot less effort to get some particular look,” which, well, remember when photography wasn’t an art form because “you just pushed the button?”
Same thing for the code models trained on openly available code.
That being said – the models are terrible at taking stage direction. I’ve tried in vain for a year to make something even close to the mascot of my company, and the models just won’t do it. Knowing what you want is the hardest part, and knowing how to get it is still a challenge, so there will still be art jobs, but the “put lines on paper” part of the job may indeed be de-emphasized.
So, in my estimation, no, there’s really nothing new or risky or bomb-like here. But it makes for a good headline in the media climate
I literally screamed like a football fan to a goal score when GPT4-Turbo 128k was announced, wahhH!
Also illustrative is to demonstrate when the AI doesn’t have a good idea what comes next:
(I cheat a bit by using babbage-002
which has perplexity so high it is without application. I force a number by providing a trailing space to block words)
When temperature and top-p do not modify the choices and their values at 1, the chance of any given generation being one of those tokens is just as you see. 10.3% of a huge number of responses will have “1”.
ChatGPT 4.0 is obviously now the latest version:
It doesn’t know about today’s announcement of custom GPTs (past the traning date or just because not fully rolled out yet?)
Accessing custom versions of GPT (Generative Pretrained Transformer) models can be done typically through platforms that allow users to train and deploy machine learning models. Here’s how you might go about it:
-
Using OpenAI’s Platform:
- OpenAI provides an API for GPT-3 and has announced future access to GPT-4, which allows developers to integrate the model into their applications. OpenAI also offers fine-tuning capabilities that can be used to train GPT models on custom datasets to better suit specific tasks or domains.
-
Through Other Cloud Providers:
- Some cloud providers offer services that let you train machine learning models, including custom versions of GPT. They may provide pre-trained models as a base for further training.
-
Building Your Own:
- If you have the expertise, you can create a custom GPT model by training it on a dataset of your choice. This requires significant computational resources and machine learning knowledge.
-
Collaborating with Research Institutions or Companies:
- Some research projects and companies develop their custom versions of GPT for specific purposes and may offer access to their models for research or commercial use.
-
Using Open-Source Implementations:
- There are open-source versions of transformer models that can be customized and trained with your data. These are available on platforms like GitHub.