Using gpt to structure large amounts of data to json format

Hey guys,
I am using a gpt api call to feed unstructured data into GPT-4 and get it back as a structured json object. This is then further used in my application.

The problem I’ve encountered now is that some users want to feed really large data into it, so large that I am running into the token limit. With the current token limit the LLM then starts to summarize things in order to still send a complete json object back.

What would be the best way to parse unstructured data of lets say 6k tokens to a structured json object via open ai api?

So are you trying to get back the content of the input data verbatim or is there some transformation happening as it is restructured into a JSON object?

1 Like

If you have access to an older gpt-4-32k, it could work.

Additionally with gpt-4-turbo if you could get it to produce the output verbatim and get it to automatically cut-off at 4096 tokens, the rest of the output can be produced by appending the assistant message received to the existing messages list and making the API call again.

1 Like

if it’s a Facebook download, try to divide them into batches.

Yes, verbatim. I am also using a temperature of 0. I just want to transform unstructured data to a structured json object.

I am accessing it via api, so I could specify that model. But doesnt that have 4096k max tokens for output as well?

What you are describing would solve my problem. How would I do that, as sending the same api call would just get me the first 4096 tokens again? Now I am just making a single api call and saving the response in my database.

Is there a way I can use assistant + thread function to accomplish what I want?

Depending on what structure you are opting for, you might want to take a look at this thread:

A few members of the Forum including myself discussed and worked out a solution for semantically chunking a document using GPT-4-turbo. In essence, the approach involves using GPT-4-turbo to create an outline of the document (incl. the identification of the start and end position of individual sections within the document) and then use the information to programmatically extract the text verbatim from the document into a structured JSON.

The benefit of this approach is that is that you only need one API call to get the document’s basic structure and that you don’t have to worry about the output token constraints. Additionally, it saves a lot of cost compared to a scenario where you ask the model to return the text verbatim. That said, the approach currently is mostly applicable to documents that have clearly defined sections.


Great discussion, thanks for linking. I actually used to have your exact method implemented a few months ago to save cost. Ill think about it and try if I can chip in in the other thread.

1 Like

I’ve gotten this to work with gpt-4o quite well with the following:
(btw I’ve found JSON mode actually does not work well when the response is longer then 4096 tokens)

let responseMessage = "";
  const openai = new OpenAI({
    apiKey: process.env.OPENAI_API_KEY,
  // eslint-disable-next-line no-constant-condition
  while (true) {
    const responseStream = await

        messages: [
            role: "system",
            content: system,
            ? [
                  role: "assistant" as const,
                  content: responseMessage,
            : []),

        stream: true,

        ...(responseFormat === "json" && {
          response_format: {
            type: "json_object",

    let lastFinishReason:
      | ChatCompletionChunk.Choice["finish_reason"]
      | undefined = undefined;
    for await (const chunk of {
      const content = chunk.choices[0]?.delta?.content || "";
      if (!responseMessage) {
      responseMessage += content;
      onEvents.onChunk?.(content, responseMessage);
      lastFinishReason = chunk.choices[0]?.finish_reason;
    if (lastFinishReason === "length") {
      console.log("Response stopped because of max tokens");
    } else {