Links in system prompt for GPT-4?

Hi all,

I was wondering if it were possible to provide a link in the System prompt to tell GPT to reference a particular webpage. For example, would GPT understand a system prompt such as:

“You are a college evaluator working for the Higher Learning Commission. You will be provided documents to review against the HLC criteria located at this link:

Would GPT review the information on the linked page and use it to answer the user’s prompt?

I have asked several different sources and get different answers. Appreciate any clarification here.


Are you talking about chatgpt gpt-4 with bing, or plugins, or the gpt-4 API?

In general, gpt will not use external data for inference unless it’s been loaded into the context by some other system.

that doesn’t mean it’s not gonna work. if the data is relatively old (circa 2021) there’s a good chance that the model might still seem to respect it because it’s been part of its training data.

if the data fits into the context, you could try to just copy it into the system message, you might get better results.


Within ChatGPT itself, if using Browse with Bing it does (within the custom instructions anyway).

API call no, as that doesn’t access the internet, you’d need to send that content of it. It may know about it, however, it’d likely be out of date.

Thank you both. Yes, I was looking to use this as an API call to GPT-4.

For the example earlier, one option as you mentioned would be to copy and paste the text from the link into the prompt. However, this would only work with sort instructions because it will hit the token limit quickly. Let me continue to use the link as an example. The page has a series of criteria used for evaluation (I am just using their page as an example for now). If it is too long to copy and paste into the system prompt, is there another way to have GPT-4 “learn” this information?

For example, would it be possible to custom train a model to use the criteria to evaluate material provided in a prompt? From what I gather about GPT-4, the answer is yes. But how, specifically, would this training be done? Would it use the prompt and response format such as used in the fine-tuning ( or is there another way that I am missing?

(I ask because I hear many companies say that their GPT model was trained using their content library (Kahn Academy, for example) and I am not sure what they mean by “trained on their content library.” If it can’t use links as a source to reference content, then how would a model be trained using an existing content library?)

Thank you all for your help!

I think that’s where opinions diverge. I personally don’t think that fine tuning (‘training’) a model with additional information is a good idea for this task, although it might work to some degree if you have a good data set.

I’d rather suggest trying to reconsider your approach, and see whether the task could be split up into ‘smaller’ subtasks. It looks like you have a bunch of criteria that could be evaluated independently. How would a human go about doing this task?

edit: didn’t answer your question:

“trained on their content library.” If it can’t use links as a source to reference content, then how would a model be trained using an existing content library?)

well, they probably prepared their content in text form and fed it through the process.

There are several ways for an ML system to understand the patterns in your docset. From simplistic/inexpensive to sophisticated/expensive: prompt techniques->fine-tuning->new-foundation-model. Primarily fine-tuning and new-foundation-model are left to the hard-core data science folks and/or companies with big budgets (current model training efforts require a lot of horsepower).

Btw, the term “fine-tuning” specifically means changing the weights of the underlying model. The general press often (incorrectly) refers to “fine-tuning” as a generic term for making a model respond to your docset.

It’s no surprise that most people stick to prompt techniques. About a dozen or so prompt techniques right now have various levels of success. You can use these techniques in conjunction with a currently popular architectural pattern called Retrieval Augmented Generation (RAG). The goal of this pattern is to provide additional content in a prompt to persuade the underlying foundation model to generate text that is similar to that content. The actual prompt that is sent to the LLM has “context” in addition to the original user prompt that is appended to that “context”. Each model has a limit on the size of this “context”; obviously, a larger context allows you to get a better result from the LLM. The downside is that you send a big chunk of text with every user prompt, which can be quite expensive if you have many users.

If you are a programmer, you use the OpenAI APIs and some sort of database system that can store your data and, importantly, find how similar text chunks are to each other (very often, this is a vector database). If you are not a programmer, you use a service that uses this technique “under the hood”. Of course, commercial products use variants of this technique, eg, they may use multiple LLMs, multiple models, etc.

Note LLMs don’t really “understand” URLs, PDFs, etc. The text relationship information in those resources is extracted and stored in a vector database system. The original text itself is often saved as meta-data associated with the text relationship so you can see where the original info came from. In your case, the original URL would be saved as meta-data along with the relevant text on that page.

You have to investigate Khanmigo to see if they use RAG or model fine-tuning. Bloomberg has their own foundation model, which is trained on general info and financial-specific data.

Hope this helps

1 Like

Hello please,

i have a problem with my chat gpt account.
Actually i wonder how the gpt3.5 model and the gpt4 model are not giving me answers to my prompt when i’m connected on a personnal computer computer but when i try to use use my phone i don’t have any problem.

Can you people help me solve this problem?

You can accomplish this task using ChatGPT with the Webpilot plugin.

If you can’t use this plugin directly through the API, perhaps it might work with LangChain:

Yes, this helps a ton! Thank you.

They do indeed! Thank you for clarifying this term.

I am an educator and the problem is that I was thinking the AI would learn from a text like a person, but I see from your answer that this is not the case. Instead of learning from the text, it looks at the pattern of words in the text in order to generate something similar. Therefore, to train a model, I would need to provide examples of work completed based on the instructions, rather than giving it the instructions itself (such as those HLC criteria). Am I understanding this correctly?

Agreed. I have noticed that various prompt techniques make a huge difference in the quality of the response. I have heard of RAG and will look into it more. The “context” that you mention would go in the “system” prompt, correct? I have achieved much better results sending detailed context about the specific role and task in the “system” prompt.

A point of frustration with this is that the GPT model doesn’t remember a conversation in an API call like it does in the chat window. I was hoping that training might reduce the amount of content I had to include from the previous responses in each incremental prompt. Hopefully, at some point API calls will have a way to remember the chat history…

Thanks everyone!

I am an educator too.

Your assessment of URL/PDF “understanding” is accurate. The model looks at ‘chunks’ of words, derives a mathematical version of that chunk based on the tokens in that chunk. This math representation (an embedding vector) is used progressively to generate a similar text to your starting point, ie, a prompt. The model is “completing” a “prompt”. So the notion of “instructions” is an abstraction. It’s easier to explain to a non-tech person. :slight_smile:

Yes, imo, it’s important understand the various prompt techniques. There are some interesting new ones and undoubtedly there will be other in the future.

The overall idea for a RAG is to use a Vector DB or some datastore that can calculate embedding relationships and also store your original content “chunks” (eg, sentence, paragraph, markup). You find related content in your pre-stored embedding vectors (phase 1 - data engineering… an important phase). Take the top N closest ones… get the text associated with those top ones and prepend them to the user’s original ‘query’ (prompt). Send the whole thing to the model for a completion.

Since the API is stateless, you need to send the context of the conversation to the model to ensure it maintains continuity with the user. So you take the answer you just got back from the model (the “assistant” messages) and the big prompt you already sent and the user’s new request to the model. The size of this increasing prompt is your “context window”… each model has it’s own max size. Obviously a larger context window is better to maintain a decent conversation.

The System prompt are instructions on how you want the results. I don’t know for sure, but it seems the same as just adjusting the prompt preamble to influence the model. Maybe someone from OpenAI could respond to that. I’ve been sending context via increasing-in-size user and assistant prompts. Btw, Assistant prompts confused me for a while. Sometimes its useful to think of these are “From Assistant” messages when crafting a prompt.

When the prompt context (preamble) becomes really large, then it might be time to think about true fine-tuning. Sending very large chunks on every call could be quite expensive if you have a large number of users. Keep an eye on your account charges.


Thank you for this detailed response. This helps clarify a lot of what I have been wondering about.

Learning AI broadly speaking and GPT involves a lot of sliding backwards and revising my earlier assumptions. :grin:

Thanks for that final note about the account charges. I am on that prompt-engineering dance trying to find the equilibrium between being concise while also getting the model to produce a predictable format of the output.

1 Like