Is there a best practice to encode Complex tables for RAG processing?

I usually use Markdown to encode simple tables, but it gets more complicated for tables with merged cells, nested tables, or text that contains new lines that are not supported in Markdown.

What is the best practice for encoding the table as input for the RAG engine?

Simple table (markdown)

| Person | Jan                                      | Feb                                    | March                                  |
|--------|------------------------------------------|----------------------------------------|----------------------------------------|
| Bob    | Frosty beginnings and resolutions abound | Hearts and groundhogs, winter midpoint | Lion to lamb, spring hopeful arrival |

Complex table (markdown)

| Person | Q1 ||||
|--------|--------|--------|--------|--------|
|        | Jan    |    Feb |        |  March |
| Eve | I will never do that again. | Honey never spoils. | Archaeologists have found honey pots in ancient Egyptian tombs that are over 3,000 years old and still perfectly edible. | Spring begins, days lengthen, and nature stirs. Basketball fever rises as winter fades to warmth. |
| | | Koalas fingerprints are so similar to humans that they have occasionally been confused at crime scenes. | This makes koalas the only non-primates with unique fingerprints. | When will the winter arrive? |
| Bob | Frosty beginnings and | Hearts and groundhogs, winters midpoint. | | Lion to lamb, spring hopeful arrival. |
| | resolutions abound. | | | |

1 Like

Welcome to the dev forum @kfir.alf

I’d recommend using JSON:

Simple Table JSON:

{
  "simple_table": [
    {
      "Person": "Bob",
      "Jan": "Frosty beginnings and resolutions abound.",
      "Feb": "Hearts and groundhogs, winter's midpoint.",
      "March": "Lion to lamb, spring's hopeful arrival."
    }
  ]
}

Complex Table JSON:

{
  "complex_table": [
    {
      "Person": "Eve",
      "Q1": {
        "Jan": [
          "I will never do that again.",
          "Koalas' fingerprints are so similar to humans that they have occasionally been confused at crime scenes."
        ],
        "Feb": [
          "Honey never spoils.",
          "This makes koalas the only non-primates with unique fingerprints."
        ],
        "March": [
          "Archaeologists have found honey pots in ancient Egyptian tombs that are over 3,000 years old and still perfectly edible.",
          "Spring begins, days lengthen, and nature stirs. Basketball fever rises as winter fades to warmth.",
          "When will the winter arrive?"
        ]
      }
    },
    {
      "Person": "Bob",
      "Q1": {
        "Jan": "Frosty beginnings and resolutions abound.",
        "Feb": "Hearts and groundhogs, winter's midpoint.",
        "March": "Lion to lamb, spring's hopeful arrival."
      }
    }
  ]
}

This JSON format captures the structure and content of both the simple and complex tables from the image.

3 Likes

@sps Thanks for your help!
If I understand your suggestion, you suggest that every cell be an item on a list. In the complex_table, Eve has a “nested table” inside the cell. So it should be something like:

      {
        "Person": "Eve",
        "Q1": {
          // ...
          "Feb": [
            ["Honey never spoils.", "Archaeologists have found honey pots in ancient Egyptian tombs that are over 3,000 years old and still perfectly edible."],
            ["Koalas' fingerprints are so similar to humans that they have occasionally been confused at crime scenes.", "This makes koalas the only non-primates with unique fingerprints."]
          ],
          // ...
        }

Do you think I can also do it using YAML?

complex_table:
  - Person: Eve
    Q1:
      Jan:
        - "I will never do that again."
      Feb:
        - - "Honey never spoils."
          - "Archaeologists have found honey pots in ancient Egyptian tombs that are over 3,000 years old and still perfectly edible."
        - - "Koalas' fingerprints are so similar to humans that they have occasionally been confused at crime scenes."
          - "This makes koalas the only non-primates with unique fingerprints."
      March:
        - |
          Spring begins, days lengthen, and nature stirs. Basketball fever rises as winter fades to warmth.
          When will the winter arrive?
  - Person: Bob
    Q1:
      Jan: "Frosty beginnings and resolutions abound."
      Feb: "Hearts and groundhogs, winter's midpoint."
      March: "Lion to lamb, spring's hopeful arrival."

I’m suggesting is that every row is a whole json object, and the table a list of rows thus it makes the table a json list.

Yes you can also use yaml but yaml has its own drawbacks with indentation dependency being one.

2 Likes