OpenAI Answering Technical Details

How does ChatGPT UI actually work? Even when having conversations longer than the model’s context length, it seems to handle them easily. How does it do that? If I want to mimic the same UI capability using the API, what strategy should I use?

Say if I have a pdf of 500k tokens and I need to create a summary of it, chatgpt does this (checked) but how does it do?

The OpenAI Cookbook can help you get started in understanding some of these questions.

This is related to your PDF question.

2 Likes

I think you might be overestimating how many actual tokens of text are extracted from PDF documents.

Let’s examine the lifetime output of William Shakespeare, for example.


Tragedies
289,628 words
Comedies
283,011 words
Histories
263,358 words
Poems
30,909 words
Sonnets
17,515 words
(open-source Shakespeare, George Mason University)

AI tokenization might be 1.25x that word count.


ChatGPT proprietary techniques we can extrapolate from the parallels on the API.

  • Previews - in some iterations of ChatGPT, preliminary text from documents or files is injected into AI context, estimate ~8k max
  • Vector stores - this is a file search that will extract and chunk just text from a PDF. The AI can write searches and get top-ranked results.
  • Code interpreter - files are uploaded here also, and if pressed, the AI can write python code to extract segments of documents, with a maximum character count that can be returned.
  • Full single document extraction, such as “input_file” on the Responses API (also including vision), which also has page limits - unlikely in ChatGPT

Thus, “how” would it summarize when it cannot actually receive full placement?

  • extrapolation: write a summary that looks like it belongs to the hypothetical document from the parts seen
  • pre-training: ask the AI about Shakespeare from your upload, it can likely answer from its much larger ingestion of documents, including articles and analysis
  • web search: it can discover other abstracts or reviews of the document under discussion
  • post-training: summation is a skill that takes extensive work in creating a model that can do this task, as it is not a native quality of text prediction. The AI might follow its authentic-looking patterns enforced by OpenAI
  • fabrication: AI models predict text, and are good at predicting and convincing the reader of quality despite less-than-factual output

A true summary would thus only be by complete text observation, and self-attention avoids the cost of an AI model even operating on context equally on the whole.

True Token count: On the API, try the content part of a user message of PDF file_input (alongside text type and image type). See the actual input bill. See the quality of the summary.

How does ChatGPT keeps running when conversations grow? By discarding input, and summation and injection of “memory”. OpenAI offers a chat history store in Responses, besides your own management which can be more clever and budgeted.

4 Likes