Introduction
This essay is intended to share ideas and practical knowledge, not to assert priority over them.
Most of the component ideas presented here have precedents in prior work. The distinctive contribution of this essay may lie not in chunking, hierarchical reduction, or context-aware merging taken in isolation, but rather in recasting each non-leaf node in a document-native semantic hierarchy as a subglobal adjudicator that preserves evidence, adjudicates relations of support, contradiction, qualification, and exception within its semantic jurisdiction, and escalates those adjudicative outcomes through the hierarchy to the root for rigorous full-scan document analysis.
Summary
RAG is vulnerable to retrieval miss. For rigorous document analysis, would full-scan plus hierarchical approval be more appropriate?
A plain single API call cannot reproduce the same tool-integrated behavior as the hosted ChatGPT service on the Internet. In my implementation experience, under that condition, when a model is asked to evaluate business documents with deep reasoning, the response time can reach on the order of 100 seconds. With MapReduce, wall-clock latency can be shortened through parallel processing.
Should OpenAI introduce a file_analysis API based on this MapReduce mechanism, rather than relying on code_interpreter and RAG-based file_search?
When global context is provided, this technique can be applied not only to rigorous file analysis but also to large-scale or complex file creation. In this sense, LLM MapReduce is better understood as MapReduce-like rather than as classical MapReduce in the strict sense, because the distributed workers are guided by shared global context rather than operating as fully independent classical mappers and reducers.
A Practical Constraint in Azure Foundry
- In principle, file analysis should rely on
code_interpreterrather than retrieval. Yet for rigorous file analysis, evencode_interpretermay still be too slow and not fully sufficient in answer quality. - It seems unlikely that
code_interpretercan provide the kind of document-level multimodal understanding offered by Microsoft Document Intelligence. In other words, it is probably insufficient for rigorous file analysis. - In my case, I had to use GPT-5, GPT-5.1, or GPT-5.2 in Azure Foundry, because GPT-4o did not provide enough output tokens or enough answer quality. However, at least in my Azure Foundry environment, I could not enable
code_interpreterwith GPT-5-class models. - Therefore, I was effectively forced to rely on
file_search, namely RAG-based file analysis. This made one problem unavoidable: retrieval miss. Since retrieval-first analysis cannot guarantee full coverage of a document, it remains structurally inadequate for rigorous document analysis. - If retrieval were fully controllable, RAG might still be improved through chunking, metadata design, reranking, and fallback strategies. However, when document analysis depends on a third-party retrieval layer such as
file_search, users cannot sufficiently control or verify the retrieval process. In that setting, retrieval miss becomes not merely an implementation issue but a structural limitation for rigorous document analysis. - Although my implementation experience was mainly with the Assistants API, the Responses API allows the use of
file_inputsandcode_interpreter. Withfile_inputs, the model can at least access extracted text and, in some cases, extracted images such as PDF page images. However, what happens beyond that is largely opaque to the user. As a result, it is difficult to know, control, or verify how much of the file’s structure or other non-textual information is actually being interpreted. Therefore, my concern remains that they may still be insufficient for rigorous file analysis in two respects: answer quality and latency.
MapReduce of Full-Scan Rather Than RAG of Search and Retrieval
- RAG is vulnerable to retrieval miss. For rigorous document analysis, full-scan plus hierarchical approval by orchestrators should be chosen. In this architecture, the computational units are not mere processes but AI, that is, agents.
- In addition, by using MapReduce, we may reduce the response time required for the deep reasoning demanded by rigorous document analysis. The total amount of computation and the total compute cost may remain unchanged or increase, but wall-clock latency may be drastically reduced.
MapReduce Targeting AI Agents, Not Computer Processes
- Classical MapReduce distributes computation, whereas LLM MapReduce distributes semantic interpretation by AI agents.
- In ordinary MapReduce, multiple processes handle chunks of big data in parallel in order to reduce wall-clock latency. However, in this LLM MapReduce, the meaning of map is to transform natural language embedded in structured data, that is, to map from natural language space to natural language space. In other words, the computational units are not mere processes but LLMs, namely agents.
Approval Under Global Context by the Root Orchestrator
- If we let each LLM map chunks independently, the mapped values of the chunks will have no meaningful relation to one another. In that case, when those mapped values are aggregated, the result will not be meaningful. Therefore, we must first generate a global context, for example by having an LLM quickly summarize the document.
- A root orchestrator of LLMs distributes that global context to the slave LLMs at each level, and those LLMs generate mapped values based on that global context.
- The adjudicator at each node generates subglobal contexts for the semantics within its jurisdiction, and then distributes those contexts to the workers under its control.
- The root orchestrator then decides whether to approve or reject the final aggregated candidate result. If it rejects the result, the slave LLMs must process the chunks again.
Contradiction Detection and Resolution Across Chunks and Agents
- In rigorous document analysis, contradictions must be detected not only within a single chunk but also across multiple chunks and across the mapped values produced by multiple agents.
- This is because a claim may appear correct within one chunk, but it may be modified or negated by limiting conditions, exceptions, objections, or the interpretation of another agent found in other chunks.
- Therefore, an orchestrator is not merely an aggregator. It must examine relations such as support, contradiction, qualification, and exception among the mapped values
result_iand mediate them. - In this case, contradiction resolution must not be a simple majority vote. It must be a judgment based on global context, provenance, evidence strength, scope, and consistency.
- Therefore, in a full-scan architecture, contradiction detection and resolution are not auxiliary functions but central functions for establishing the validity of the final result.
Distributed Cognition Rather Than Distributed Computing
- Classical MapReduce is a form of distributed computing, because each node merely executes a predefined procedure over local data.
- By contrast, LLM MapReduce is a form of distributed cognition, because each node is not a mere process but an interpretive agent that reads, evaluates, and transforms local content into semantic judgments.
- In this architecture, map is not merely a computational procedure but local semantic interpretation, reduce is not merely compression but the integration of judgments, and approval by the root orchestrator is global adjudication.
- Therefore, in this architecture, intelligence does not exist only inside a single massive model, but is constituted in the hierarchical interactions among multiple agents.
- Hence, this MapReduce-like architecture should be understood not as an extension of distributed computing but as a transition to distributed cognition.
An Implementation Example for a PoC
- A user uploads a document through the frontend.
- The backend converts the document into structured data by using Microsoft Document Intelligence or python-pptx.
- If the document contains data that cannot be structured by python-pptx, the backend routes it to Microsoft Document Intelligence by logic.
- The backend divides the structured data into chunks.
- The backend sends those chunks one by one to multiple LLMs.
- Those LLMs perform full-scan evaluation on those chunks from the perspective of the document, that is, map from
chunk_itoresult_i. - The backend stores those evaluations in object storage or an in-memory database.
- The backend sends all of those evaluations to one LLM.
- That LLM aggregates them, that is, performs reduce.
- The frontend displays the aggregated result to the user.
Note that, in reality, if one LLM does not provide a global context to each LLM, a meaningful aggregated result cannot be obtained.
Also note that the meaning of a PoC is that, although there should ultimately be multiple hierarchical orchestrators that aggregate result_i, such a complex implementation should not be built from the beginning. First, we should confirm whether an answer can be obtained at all, and then human beings should evaluate the quality of that answer.
Notice
- Because multiple LLMs, that is, multiple agents, are used, sufficient attention must be paid to rate limits.
- Attention must also be paid to out-of-memory conditions in the backend.
- Attention must also be paid to the requirement that
sizeof(result_i) < E(sizeof(chunk_i)). If this condition is not satisfied, the total amount of computation may diverge, and wall-clock latency may increase.
Prior Work and Precedents
- MapReduce-style processing for long documents has already been discussed in the OpenAI Developer Community. In these discussions, a document is split into chunks, each chunk is processed separately, and the intermediate outputs are then combined or recursively reduced.
- Community discussions have also anticipated the idea of supplying higher-level context to lower-level units, for example by attaching a parent-document summary to each chunk, using semantic sub-chunking, or introducing chapter-level summaries for manuals and other structured documents.
- Chain-of-Agents proposed multi-agent processing for long-context tasks as an alternative to both retrieval-based input reduction and monolithic full-context prompting. Its framework uses multiple workers to handle segmented portions of a text and a manager to synthesize their contributions into a final result.
- LongRefiner proposed leveraging the structural characteristics of long documents through hierarchical document structuring and adaptive refinement, thereby treating document structure itself as an important computational resource.
- Tree-oriented MapReduce for long-context reasoning proposed representing a long document as a hierarchical document tree and applying recursive MapReduce across that hierarchy. It explicitly includes fact aggregation and conflict resolution during hierarchical reasoning.
- Context-Aware Hierarchical Merging proposed enriching hierarchical merging with source context, supporting evidence, and citation-based alignment in order to reduce factual errors introduced during recursive merging.
- Prior work on claim provenance, summary-source alignment, and fine-grained text provenance overlaps with the idea that aggregation should preserve evidence, traceability, and explicit relations between outputs and supporting spans, rather than collapsing everything into unconstrained free-form summarization.
Disclaimer
OpenAI ChatGPT was used for minor suggestions and for assistance in preparing the English text of this document. The ideas presented here are entirely my own, and I take full responsibility for them.
Appendix
A.1. Graph Construction Based on Network Theory
- We should transform the set of chunks into a graph. The structure of that graph should be hierarchical so that it reflects the structure of the document: Document, Section, Paragraph, Subparagraph, Word.
A.2. Hierarchical Semantic Space Analogous to Topological Space
- For such a graphized set, a notion of nearness can be introduced from the perspective of semantics. Therefore, it may be interpreted as a topological space. Under additional conditions, it may be metrizable, and in some cases it may be regarded as having a manifold-like structure.
- In natural language, multiple sentences can share the same meaning while differing in linguistic form. We may define neighborhoods in a topological space on the basis of this relation: identity of meaning despite differences in linguistic form.
- We must then introduce, in a well-defined manner, what counts as an open set and what counts as a closed set.
- We may further regard semantic space as a quotient space. That is, if we define an equivalence relation by semantic identity, then reflexivity, symmetry, and transitivity hold. Once such an equivalence relation is introduced and semantic space is regarded as a quotient space, we naturally want to define mappings on it. At that point, however, we must discuss well-definedness: namely, whether the value of a mapping changes depending on the choice of representative of an equivalence class.
A.3. Communication Hierarchy Beyond Document Hierarchy
- We can extend document processing by MapReduce to communication processing.
- This is because communication, like documents, also has the hierarchical structure of natural language. Communication > Conversation > Turn: Query and Response > Sentence > Clause > Compound Word > Word > …
A.4. Natural Language Generation as Movement in Semantic Space
- Above, I referred to the possibility of regarding a document as a topological space. Communication, as an extension of that idea, may also be regarded as a topological space.
- In a topological space, we can refer to original nearness by using neighborhoods and open sets. That is, we can define the nearness of a query or response in natural language space understood as a topological space.
- By applying this idea, it may become possible to generate a response that is meaningful with respect to a query.
- In other words, the correspondence between a response and a turn or query may be regarded as a transition in that semantic space.
A.5. Socially Open Non-Deterministic Mapping
- A program is generally regarded as a deterministic mapping that uniquely maps an element of a domain to an element of a codomain.
- However, when the internal structure of a program becomes highly complex and the program is socially open, it is self-evident that it becomes non-deterministic because it is affected by the social heat bath of a vast number of members and by the complexity of its own internal state.
- If subjectivity exists, then reality also exists in this society. That is to say, the existence of society necessarily receives a social heat bath and is therefore non-deterministic.
- We may, in an ideal sense, conceive of a completely closed mapping or program. However, as Nancy Cartwright argued in How the Laws of Physics Lie, such an ideal environment does not in fact exist. In other words, all existence is socially open.
A.6. Evidence-Preserving Aggregation Rather Than Free-Form Summarization
- We must require the orchestrator to verify whether a response to a query is supported or backed by facts and evidence.
- In other words, a response must be evidence-preserving information retrieval. That is, we should not allow LLMs to map chunks by entirely unconstrained methods.
- Facts and evidence must retain identifiable attributes such as proposition, provenance, exact span, support relation, strength, scope, context, temporal validity, consistency, and retrievability.
A.7. Credit-Based Economics for High-Compute Semantic AI
- MapReduce-like full-scan semantic processing requires substantially greater computational resources than lightweight retrieval-based question answering.
- In particular, parallel mapping by multiple agents, hierarchical aggregation, contradiction detection, global approval, re-processing, and graph construction require large amounts of compute, memory, storage, and API budget.
- Therefore, this kind of AI should be provided not as a uniform flat-rate function but under credit-based economics.
- The reason is that, although this architecture may increase total compute cost, it may significantly reduce the wall-clock latency required for the deep reasoning needed in rigorous document analysis.
- In other words, users should pay credits not merely for response generation, but for a semantic processing capability that is high-accuracy, high-compute-cost, and high-responsibility.
- Therefore, credit-based economics is not an incidental charging design but an institutional condition for realistically implementing and operating high-compute semantic AI.