I usually use Markdown to encode simple tables, but it gets more complicated for tables with merged cells, nested tables, or text that contains new lines that are not supported in Markdown.
What is the best practice for encoding the table as input for the RAG engine?
| Person | Jan | Feb | March |
|--------|------------------------------------------|----------------------------------------|----------------------------------------|
| Bob | Frosty beginnings and resolutions abound | Hearts and groundhogs, winter midpoint | Lion to lamb, spring hopeful arrival |
Complex table (markdown)
| Person | Q1 ||||
|--------|--------|--------|--------|--------|
| | Jan | Feb | | March |
| Eve | I will never do that again. | Honey never spoils. | Archaeologists have found honey pots in ancient Egyptian tombs that are over 3,000 years old and still perfectly edible. | Spring begins, days lengthen, and nature stirs. Basketball fever rises as winter fades to warmth. |
| | | Koalas fingerprints are so similar to humans that they have occasionally been confused at crime scenes. | This makes koalas the only non-primates with unique fingerprints. | When will the winter arrive? |
| Bob | Frosty beginnings and | Hearts and groundhogs, winters midpoint. | | Lion to lamb, spring hopeful arrival. |
| | resolutions abound. | | | |
{
"simple_table": [
{
"Person": "Bob",
"Jan": "Frosty beginnings and resolutions abound.",
"Feb": "Hearts and groundhogs, winter's midpoint.",
"March": "Lion to lamb, spring's hopeful arrival."
}
]
}
Complex Table JSON:
{
"complex_table": [
{
"Person": "Eve",
"Q1": {
"Jan": [
"I will never do that again.",
"Koalas' fingerprints are so similar to humans that they have occasionally been confused at crime scenes."
],
"Feb": [
"Honey never spoils.",
"This makes koalas the only non-primates with unique fingerprints."
],
"March": [
"Archaeologists have found honey pots in ancient Egyptian tombs that are over 3,000 years old and still perfectly edible.",
"Spring begins, days lengthen, and nature stirs. Basketball fever rises as winter fades to warmth.",
"When will the winter arrive?"
]
}
},
{
"Person": "Bob",
"Q1": {
"Jan": "Frosty beginnings and resolutions abound.",
"Feb": "Hearts and groundhogs, winter's midpoint.",
"March": "Lion to lamb, spring's hopeful arrival."
}
}
]
}
This JSON format captures the structure and content of both the simple and complex tables from the image.
@sps Thanks for your help!
If I understand your suggestion, you suggest that every cell be an item on a list. In the complex_table, Eve has a “nested table” inside the cell. So it should be something like:
{
"Person": "Eve",
"Q1": {
// ...
"Feb": [
["Honey never spoils.", "Archaeologists have found honey pots in ancient Egyptian tombs that are over 3,000 years old and still perfectly edible."],
["Koalas' fingerprints are so similar to humans that they have occasionally been confused at crime scenes.", "This makes koalas the only non-primates with unique fingerprints."]
],
// ...
}
Do you think I can also do it using YAML?
complex_table:
- Person: Eve
Q1:
Jan:
- "I will never do that again."
Feb:
- - "Honey never spoils."
- "Archaeologists have found honey pots in ancient Egyptian tombs that are over 3,000 years old and still perfectly edible."
- - "Koalas' fingerprints are so similar to humans that they have occasionally been confused at crime scenes."
- "This makes koalas the only non-primates with unique fingerprints."
March:
- |
Spring begins, days lengthen, and nature stirs. Basketball fever rises as winter fades to warmth.
When will the winter arrive?
- Person: Bob
Q1:
Jan: "Frosty beginnings and resolutions abound."
Feb: "Hearts and groundhogs, winter's midpoint."
March: "Lion to lamb, spring's hopeful arrival."