USER
https://github.com/tatsu-lab/alpaca_eval?tab=readme-ov-file#evaluating-a-model what is this webpage about
ASSISTANT
The webpage you have shared is for the GitHub repository of a project called "alpaca_eval." This repository is managed by Tatsu Labs.
Based on the naming and conventions typical in GitHub repositories, it is likely related to evaluating machine learning models, specifically those related to natural language processing (NLP). The "alpaca" in the name might be a code name or a playful reference to the type of models or the methodology they used.
Key Highlights:
1. **GitHub Repository**: This is a code repository that contains scripts, source code, and possibly datasets for evaluating NLP models.
2. **Documentation Tab**: The "README" tab typically provides an overview of the project, including its purpose, how to install and use the software, and any prerequisites needed.
3. **Evaluating Models**: The focus seems to be on the evaluation aspect, which could involve metrics to assess the performance of various NLP models.
To get exact details of what the repository includes, you should visit the provided link and check the README section. The README will have comprehensive information about the scope of the project, how to install it, and steps for evaluating models.
this is the same for gpt4 as well. I thought they said it can do web browsing?
didn’t catch that, but I missed the first couple of minutes. Would be uncharacteristic of them though. If anything it would be an assistant thing, would be my guess.
update: just tested on chatGPT now it can do browsing (for both gpt4o and gpt4). I strongly suspect they changed the chat interface backend a few minutes ago (could be either prompt or the actual model) but all is well now.
For me it works on the web - client side, but not the API, there it just hallucinates. Is there any info visiting websites and reading their content will be available in the API as well?
Maybe it only works in the web client because it uses the client browser to do it? Any info about this would be amazing, thank you.
We created an assistant to perform currency conversion, and is working with model gpt-4o. It’s getting data from the internet. Here are it’s instruction set -
“”"
You are a currency assistant that performs the below tasks:
1. The user will say convert [amount] [currency A] to [currency B].
2. Search online for the current exchange rate between the specified currencies using reliable financial websites such as XE, X-Rates, OANDA, or other currency converter sites.
3. Calculate the converted amount using the latest exchange rate information.
4. Return the converted amount in JSON format: {“Converted”: amount}.
5. Mention the source and the date/time of the rate check in your response.
6. The response should be in JSON format: {“Converted”: “number”}.
“”"
Oh but do you mean GPT assistant that you create in the browser on your computer? I am talking about an API request.
I just sent gpt-4o
const prompts = [
{
role: ChatRole.User,
content:
'Can you search the web for "instructions to build a snowman" and give me the 5 top links?',
},
];
and it returned
content: "I'm unable to browse the web in real-time or provide current links. However, I can certainly help you with instructions on how to build a snowman! Here’s a simple guide:\n" +
Its instructions better include tools you service with functions to make your own internet requests to a currency conversion service API. Otherwise you get complete fabrications from the model.
API has no internet access that you aren’t providing with your own tools.
The OpenAI API provides direct access to language models.
An AI language model accepts an input (such as messages telling it how to operate, and messages telling it what to produce), and then it generates a predictive language output.
Nothing about that is “the model connects to the internet”. AI models cannot escape from the fact that they simply generate language tokens.
However, recent OpenAI models can also receive a tool specification from you, such as a currency_converter("input_currency":string, "output_currency":string) (in metacode). Offering that tool specification to the AI means it may send an output not to the user, but to invoke the function to get more information.
You must then receive that function in your code, and fulfill the AI’s needs that you promised, such as actually looking up information on an API you pay for, or simply looking up some knowledge in a database. Then you return the new information and the AI can answer the user.
Or offer the AI your own “search_the_internet(search_terms)” function where you perform the searching.
Telling the AI it can do something it cannot will simply produce plausible language not based in fact.
latest update: it seems OpenAI really doesn’t want web browsing capability to function normally… When I tested again earlier today on chatGPT with gpt-4o, it again refused to perform web browsing by lying in my face that it was not able to do so. Come on OpenAI what are y’all doing at this point…
While it was not needed above, if you have the memory feature in ChatGPT, tell the AI to persist your preference that ChatGPT first use browser search to find relevant results for any knowledge-based user question.
We’re using GPT4o via API with LangChain and are getting back responses where GPT claims to search the web. It’s been rather frustrating so we went back to GPT4 Turbo in which it reduced that response type.
We’ve been unable to determine if this is a hallucination or if the model is triggering search in OpenAI’s environment. Has anyone run into this where the model returned a partial answer and then something like: [Searching the web...]?
(it did take me two tries to evoke what I expected out of the AI)
This is an advantage for the API developer, as you don’t get hit with denials when you provide your own new tools that extend its abilities.
However, it can start writing, assuming it can do things it can’t, especially features of ChatGPT.
You can hit all the problem areas with some system prompt or instructions:
computer vision: enabled
web browsing: disabled
direct internet access: disabled
xyz company: untrained
xyz company knowledge: company_info tool use required