OpenAI Answering Technical Details

r.devkota.98 · December 8, 2025, 6:51pm

How does ChatGPT UI actually work? Even when having conversations longer than the model’s context length, it seems to handle them easily. How does it do that? If I want to mimic the same UI capability using the API, what strategy should I use?

Say if I have a pdf of 500k tokens and I need to create a summary of it, chatgpt does this (checked) but how does it do?

EricGT · December 8, 2025, 7:35pm

The OpenAI Cookbook can help you get started in understanding some of these questions.

This is related to your PDF question.

github.com/openai/openai-cookbook

examples/Entity_extraction_for_long_documents.ipynb

134d8a16f


      
          "cells": [
           {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
             "# Long Document Content Extraction\n",
             "\n",
             "GPT-3 can help us extract key figures, dates or other bits of important content from documents that are too big to fit into the context window. One approach for solving this is to chunk the document up and process each chunk separately, before combining into one list of answers. \n",
             "\n",
             "In this notebook we'll run through this approach:\n",
             "- Load in a long PDF and pull the text out\n",
             "- Create a prompt to be used to extract key bits of information\n",
             "- Chunk up our document and process each chunk to pull any answers out\n",
             "- Combine them at the end\n",
             "- This simple approach will then be extended to three more difficult questions\n",
             "\n",
             "## Approach\n",
             "\n",
             "- **Setup**: Take a PDF, a Formula 1 Financial Regulation document on Power Units, and extract the text from it for entity extraction. We'll use this to try to extract answers that are buried in the content.\n",
             "- **Simple Entity Extraction**: Extract key bits of information from chunks of a document by:\n",
             "    - Creating a template prompt with our questions and an example of the format it expects\n",

_j · December 8, 2025, 7:38pm

I think you might be overestimating how many actual tokens of text are extracted from PDF documents.

Let’s examine the lifetime output of William Shakespeare, for example.

Tragedies
289,628 words
Comedies
283,011 words
Histories
263,358 words
Poems
30,909 words
Sonnets
17,515 words
(open-source Shakespeare, George Mason University)

AI tokenization might be 1.25x that word count.

ChatGPT proprietary techniques we can extrapolate from the parallels on the API.

Previews - in some iterations of ChatGPT, preliminary text from documents or files is injected into AI context, estimate ~8k max
Vector stores - this is a file search that will extract and chunk just text from a PDF. The AI can write searches and get top-ranked results.
Code interpreter - files are uploaded here also, and if pressed, the AI can write python code to extract segments of documents, with a maximum character count that can be returned.
Full single document extraction, such as “input_file” on the Responses API (also including vision), which also has page limits - unlikely in ChatGPT

Thus, “how” would it summarize when it cannot actually receive full placement?

extrapolation: write a summary that looks like it belongs to the hypothetical document from the parts seen
pre-training: ask the AI about Shakespeare from your upload, it can likely answer from its much larger ingestion of documents, including articles and analysis
web search: it can discover other abstracts or reviews of the document under discussion
post-training: summation is a skill that takes extensive work in creating a model that can do this task, as it is not a native quality of text prediction. The AI might follow its authentic-looking patterns enforced by OpenAI
fabrication: AI models predict text, and are good at predicting and convincing the reader of quality despite less-than-factual output

A true summary would thus only be by complete text observation, and self-attention avoids the cost of an AI model even operating on context equally on the whole.

True Token count: On the API, try the content part of a user message of PDF file_input (alongside text type and image type). See the actual input bill. See the quality of the summary.

How does ChatGPT keeps running when conversations grow? By discarding input, and summation and injection of “memory”. OpenAI offers a chat history store in Responses, besides your own management which can be more clever and budgeted.

Topic		Replies	Views
I wish that when using the GPT API, it would be possible to have a contextual conversation like chatGPT API	14	7356	December 18, 2023
Is there any way by which I can let GPT-4 API summarize large PDF texts? API gpt-4 , api	10	12346	May 6, 2024
How should a program be written to summarize a long text using an API, and what are the considerations regarding the maximum number of tokens allowed? API	2	2515	April 19, 2024
How does ChatGPT have such massive token limit? API	12	32500	December 12, 2023
ChatGPT memorising verbatim more than 6000 tokens. What is the true token limit? API gpt-35-turbo , chatgpt , token	3	2293	December 17, 2023

OpenAI Answering Technical Details

Related topics