PDF summarizer using openai

cyzgab · December 30, 2023, 7:40am

Embeddings are usually used so that we can retrieve chunks of text for an retrieval augmented generation (RAG) application. For example, given user query A, I want to find documents related to it. This process of “finding documents related to it” is done by comparing the embeddings of the Query A & your repo of documents

From what you’ve described, your scenario is much simpler: You’re just summarizing a given text in a pdf. If the length of your pdf exceeds the context window of the model, you can chunk it up into smaller parts & ask the LLM to summarize each smaller part.

As a first step, increasing the max_tokens parameter as others have suggested & also checking the token length of your document. Consider looking at this other post: Counting tokens for chat API calls (gpt-3.5-turbo)

Topic		Replies	Views
Is there any way by which I can let GPT-4 API summarize large PDF texts? API gpt-4 , api	10	11357	May 6, 2024
Chained Prompt to complete text larger than 4000 tokens? API	14	6062	December 25, 2023
Help Needed: Tackling Context Length Limits in OpenAI Models Community gpt-4 , chatgpt , token , rate-limit , openai	8	16261	February 8, 2024
Sending large document via API call and asking for a question over complete document? Prompting api	3	1776	February 26, 2024
Information summary by using API API	3	6876	January 9, 2024

PDF summarizer using openai

Related topics