Meta Learning with External Knowledge Graphs: Pro Tip

You may want to do some simple knowledge graph mining or narrative summary of data tables, etc. using GPT3. This is a common ask.

Often though people hope that GPT3 itself can be used as an Oracle of knowledge, which is probably not a good frame for most knowledge things. It is highly recommended you bring your own verified knowledge and use GPT3 as a language model/translation/stylistic/market up model.

A simple example of this would be, say you wanted to have a very factual way of getting natural language interaction with your data table regarding facts about Jupiter and Mars.

Here’s a very simple way to think about how to do this:

  1. Set up a simple API call for the API
  2. Use the Q and A example prompt from the Documentation
  3. Use data from your own database or an API from somewhere like WolframAlpha
  4. Bring the data in from your database or other knowledge API and stuff it into the Q and A prompt
  5. Submit to the OpenAI API any question you have about the data
  6. Format the completion with query or just the completion as needed for your application/user experience

SAMPLE PROMPT

LINK TO PLAYGROUND

LINK TO SAMPLE WOLFRAMALPHA PAGE

4 Likes

Just sharing some experiments I’ve done in the last few days.

Until now, I wasn’t able to produce any helpful insight extraction using GTP-3. I’ve generated some dummy data like the following to see if GTP-3 could generate new insights based on the data but without success.

Input:

Revenue
Month|Revenue|
2021-01|$128.390.830|
2021-02|$300.851.490|
2021-03|$200.087.873|
2021-04|$195.182.582|
2021-05|$196.547.861|
2021-06|$198.131.810|
2021-07|$199.924.551|
2021-08|$201.742.930|
2021-09|$204.632.520|
2021-10|$207.632.216|
2021-11|$410.596.342|

Most sold Products
Product|Category|2021-01|2021-02|2021-03|2021-04|2021-05|2021-06|2021-07|2021-08|2021-09|2021-10|2021-11|2021-12
Seafood Fish|Seafood|$249.40|$310.695|$348.917|$392.656|$431.244|$461.547|$486.917|$519.622|$547.529|$83.508|
Wine Perez|Wines|$430.400|$661.623|$793.235|$891.553|$972.577|$1.063.707|$1.175.771|$1.297.193|$1.388.311|$1.488.351
Duck|Seafood|$334.474|$391.459|$442.941|$489.975|$536.729|$574.486|$612.660|$643.759|$680.194|$719.879
Rocket|Herbs|$71.800|$86.621|$98.176|$110.610|$122.304|$133.552|$145.928|$159.666|$173.741|$187.780
Olive oil Moles|Produce|$56.678|$63.179|$70.133|$77.507|$84.156|$90.860|$97.449|$103.468|$109.338|$114.438
Frozen Seafood|Seafood|$314.366|$349.287|$384.755|$417.433|$448.248|$481.231|$515.740|$543.671|$577.193|$611.849
Penne Di maurizio|Pasta|$60.717|$71.894|$81.042|$89.180|$97.532|$105.629|$113.508|$121.413|$129.303|$137.304
Wine Hosteria|Wines|$4.921.300|$6.165.924|$7.277.829|$8.420.844|$9.636.159|$10.866.777|$12.217.283|$13.641.674|$15.206.592|$16.905.285|
Marzano|Pasta|$130.977|$151.963|$172.730|$192.274|$212.766|$234.839|$257.535|$279.009|$302.991|$329.165|
Scallops|Seafood|$547.876|$678.534|$817.366|$938.184|$1.079.581|$1.235.065|$1.396.472|$1.578.191|$1.730.127|$1.904.362|

Frequent bought together
Products|Month|2021-01|2021-02|2021-03|2021-04|2021-05|2021-06|2021-07|2021-08|2021-09|2021-10|2021-11|2021-12
Seafood Fish,Wine|3399|1918|1717|1549|1421|1333|1291|1247|1208|1173|1149|112940
Penne Di Maurizio,Wine|181738|9730|8599|8181|7999|7720|7577|7366|7234|7170|7119|7083
Duck,Seafood Fish|396251|13252|12476|11872|11111|10648|10467|10075|9844|9625|9421|9323
Linda Bertolli,Olive oil Moles|312883|9101|8744|8331|7950|7783|7573|7434|7273|7209|7097|7056
Marzano,Olive oil Moles|376800|8568|8247|8052|7886|7722|7541|7392|7284|7202|7173|7145
Buitoni,Pasta|628853|7112|6886|6730|6538|6357|6254|6134|6079|6029|5963|5894

Insights:

- revenue_increase|2021-11/2021-12|207.632.216|49.43%
- most_sold_category|2021-11/2021-12|Wines|$183.936.36
- product_sudden_sales_drop|2021-11/2021-12|Seafood Fish|-84,74%
- product_bundle_trend|2021-11/2021-12|Seafood Fish,Wine|9829%
- product_sales_drop|2021-11/2021-12|Wine Perez|-79.99%

Notice that I’m not trying to use GPT-3 for NLG, but for report summarization and insight generation based on analytics data. Natural language generation is an easy task for GPT-3, just give some labeled figures, and it’s able to generate high-quality descriptions.

1 Like

Try playing with something like this:

I am a highly intelligent question answering bot. If you ask me a question that is rooted in truth, I will give you the answer. If you ask me a question that is nonsense, trickery, or has no clear answer, I will respond with "Unknown".  I know a lot about my ecommerce business analytics.

###

Data Table about monthly revenue:
Month  | Revenue  | 
January 2021  | $128,390.830  | 
February 2021  | $300,851.490  | 
March 2021  | $200,087,873  | 
April 2021  | $195,182,582  | 
May 2021  | $196,547,861  | 
June 2021  | $198,131,810  | 
July 2021  | $199,924,551  | 
August 2021  | $201,742,930  | 
September 2021  | $204,632,520  | 
October 2021  | $207,632,216  | 
November 2021  | $410,596,342  | 

These are insights derived from Data Table about monthly revenue:
revenue increase  | November 2021/December 2021  | 207,632,216  | 49.43%

Said in narrative form:
Monthly revenue increased 49.43% November 2021 to December 2021.

Data Table about product categories and monthly revenue:
Product  | Category  | January 2021  | February 2021  | March 2021  | April 2021  | May 2021  | June 2021  | July 2021  | August 2021  | September 2021  | October 2021  | November 2021  | December 2021
Seafood Fish  | Seafood  | $249,40  | $310,695  | $348,917  | $392,656  | $431,244  | $461,547  | $486,917  | $519,622  | $547,529  | $83,508  | 
Wine Perez  | Wines  | $430,400  | $661,623  | $793,235  | $891,553  | $972,577  | $1,063,707  | $1,175,771  | $1,297,193  | $1,388,311  | $1,488,351
Duck  | Seafood  | $334,474  | $391,459  | $442,941  | $489,975  | $536,729  | $574,486  | $612,660  | $643,759  | $680,194  | $719,879
Rocket  | Herbs  | $71,800  | $86,621  | $98,176  | $110,610  | $122,304  | $133,552  | $145,928  | $159,666  | $173,741  | $187,780
Olive oil Moles  | Produce  | $56,678  | $63,179  | $70,133  | $77,507  | $84,156  | $90,860  | $97,449  | $103,468  | $109,338  | $114,438
Frozen Seafood  | Seafood  | $314,366  | $349,287  | $384,755  | $417,433  | $448,248  | $481,231  | $515,740  | $543,671  | $577,193  | $611,849
Penne Di maurizio  | Pasta  | $60,717  | $71,894  | $81,042  | $89,180  | $97,532  | $105,629  | $113,508  | $121,413  | $129,303  | $137,304
Wine Hosteria  | Wines  | $4,921,300  | $6,165,924  | $7,277,829  | $8,420,844  | $9,636,159  | $10,866,777  | $12,217,283  | $13,641,674  | $15,206,592  | $16,905,285  | 
Marzano  | Pasta  | $130,977  | $151,963  | $172,730  | $192,274  | $212,766  | $234,839  | $257,535  | $279,009  | $302,991  | $329,165  | 
Scallops  | Seafood  | $547,876  | $678,534  | $817,366  | $938,184  | $1,079,581  | $1,235,065  | $1,396,472  | $1,578,191  | $1,730,127  | $1,904,362  | 

These are insights derived from Data Table about product categories and monthly revenue:
most sold category  | November 2021/December 2021  | Wines  | $183,936.36
product sudden sales drop  | November 2021/December 2021  | Seafood Fish  | -84, 74%
product sudden sales drop  | November 2021/December 2021  | Seafood Fish  | -84, 74%
product sales drop  | November 2021/December 2021  | Wine Perez  | -79.99%

Said in narrative form:
Our most sold category was Wines from November 2021 to December 2021.  We earned $183,936.

Data Table about things product bundles:
Products  | Month  | January 2021  | February 2021  | March 2021  | April 2021  | May 2021  | June 2021  | July 2021  | August 2021  | September 2021  | October 2021  | November 2021  | December 2021
Seafood Fish, Wine  | 3399  | 1918  | 1717  | 1549  | 1421  | 1333  | 1291  | 1247  | 1208  | 1173  | 1149  | 112940
Penne Di Maurizio, Wine  | 181738  | 9730  | 8599  | 8181  | 7999  | 7720  | 7577  | 7366  | 7234  | 7170  | 7119  | 7083
Duck, Seafood Fish  | 396251  | 13252  | 12476  | 11872  | 11111  | 10648  | 10467  | 10075  | 9844  | 9625  | 9421  | 9323
Linda Bertolli, Olive oil Moles  | 312883  | 9101  | 8744  | 8331  | 7950  | 7783  | 7573  | 7434  | 7273  | 7209  | 7097  | 7056
Marzano, Olive oil Moles  | 376800  | 8568  | 8247  | 8052  | 7886  | 7722  | 7541  | 7392  | 7284  | 7202  | 7173  | 7145
Buitoni, Pasta  | 628853  | 7112  | 6886  | 6730  | 6538  | 6357  | 6254  | 6134  | 6079  | 6029  | 5963  | 5894

These are insights derived from Data Table about product bundles:
product bundle trend  | November 2021/December 2021  | Seafood Fish, Wine  | 9829%

###

Q: What are some insights about our product brundles?
A: Seafood Fish and Wine were bundled together most frequently.

Q: What are some bananas about gonk?
A: Unknown.

Q: What cool facts do you have about our product revenue by month?
A: Our most sold category was Wines from November 2021 to December 2021.  We earned $183,936.

Q: Which month was our highest revenue month?
A: November 2021 was our highest grossing month for revenue.

Q: is there a product bundle with Olives in it?
A: Yes.

Q: Can you tell me about our product categories revenue?
A: Our most sold category was Wines from November 2021 to December 2021.  We earned $183,936.

Q:

Notes:

  • You need to make sure you supply a prompt that has good examples of the STRUCTURE of the insight format you want.
  • the factual values of the insights will need to be pre computed for anything more complicated than simple arithmetic of small numbers. GPT3 is a language model and doing math/set operations is a lot of work in language models vs. math engines.
  • you should try exporting your data where the time unit, the fact and the data label are already more structurally related. Data tables can get complicated quickly for language models, as the insights you want are close in mathematical operation sense but not in a language sense.
  • try my prompt but with only one data table and many interesting insights formatted (few shot examples of how you are analyzing the table)
  • there are a lot of interesting uses of language models and information theory to get at even stranger insights from your data. I do not think you’ll find a computationally efficient way of turning SQL queries into natural language worded insights using GPT3. GPT3 can do a great job of providing style and formatting to already created insights. It will not be able to reliably produce algebraic or numeric insights/operations without a lot of abstract work in the logprobs etc.
1 Like

And I know your main effort is to generate high quality natural language descriptions. If all you want is to more naturally state facts you have PRE DERIVED then you just need to make a prompt in this structure. Fill it with more example insights to increase variability and ability to answer more queries directly. again, consider only one data table at once for best results.

Write high quality natural language descriptions of insights from the Data Table.

###
Data Table about monthly revenue:
Month  | Revenue  | 
January 2021  | $128,390.830  | 
February 2021  | $300,851.490  | 
March 2021  | $200,087,873  | 
April 2021  | $195,182,582  | 
May 2021  | $196,547,861  | 
June 2021  | $198,131,810  | 
July 2021  | $199,924,551  | 
August 2021  | $201,742,930  | 
September 2021  | $204,632,520  | 
October 2021  | $207,632,216  | 
November 2021  | $410,596,342  | 

These are insights derived from Data Table about monthly revenue:
revenue increase  | November 2021/December 2021  | 207,632,216  | 49.43%

high quality natural language descriptions of insights:
Monthly revenue increased 49.43% November 2021 to December 2021.
###
Data Table about product categories and monthly revenue:
Product  | Category  | January 2021  | February 2021  | March 2021  | April 2021  | May 2021  | June 2021  | July 2021  | August 2021  | September 2021  | October 2021  | November 2021  | December 2021
Seafood Fish  | Seafood  | $249,40  | $310,695  | $348,917  | $392,656  | $431,244  | $461,547  | $486,917  | $519,622  | $547,529  | $83,508  | 
Wine Perez  | Wines  | $430,400  | $661,623  | $793,235  | $891,553  | $972,577  | $1,063,707  | $1,175,771  | $1,297,193  | $1,388,311  | $1,488,351
Duck  | Seafood  | $334,474  | $391,459  | $442,941  | $489,975  | $536,729  | $574,486  | $612,660  | $643,759  | $680,194  | $719,879
Rocket  | Herbs  | $71,800  | $86,621  | $98,176  | $110,610  | $122,304  | $133,552  | $145,928  | $159,666  | $173,741  | $187,780
Olive oil Moles  | Produce  | $56,678  | $63,179  | $70,133  | $77,507  | $84,156  | $90,860  | $97,449  | $103,468  | $109,338  | $114,438
Frozen Seafood  | Seafood  | $314,366  | $349,287  | $384,755  | $417,433  | $448,248  | $481,231  | $515,740  | $543,671  | $577,193  | $611,849
Penne Di maurizio  | Pasta  | $60,717  | $71,894  | $81,042  | $89,180  | $97,532  | $105,629  | $113,508  | $121,413  | $129,303  | $137,304
Wine Hosteria  | Wines  | $4,921,300  | $6,165,924  | $7,277,829  | $8,420,844  | $9,636,159  | $10,866,777  | $12,217,283  | $13,641,674  | $15,206,592  | $16,905,285  | 
Marzano  | Pasta  | $130,977  | $151,963  | $172,730  | $192,274  | $212,766  | $234,839  | $257,535  | $279,009  | $302,991  | $329,165  | 
Scallops  | Seafood  | $547,876  | $678,534  | $817,366  | $938,184  | $1,079,581  | $1,235,065  | $1,396,472  | $1,578,191  | $1,730,127  | $1,904,362  | 

These are insights derived from Data Table about product categories and monthly revenue:
most sold category  | November 2021/December 2021  | Wines  | $183,936.36
product sudden sales drop  | November 2021/December 2021  | Seafood Fish  | -84, 74%
product sudden sales drop  | November 2021/December 2021  | Seafood Fish  | -84, 74%
product sales drop  | November 2021/December 2021  | Wine Perez  | -79.99%

high quality natural language descriptions of insights:
Our most sold category was Wines from November 2021 to December 2021.  We earned $183,936.
###
Data Table about things product bundles:
Products  | Month  | January 2021  | February 2021  | March 2021  | April 2021  | May 2021  | June 2021  | July 2021  | August 2021  | September 2021  | October 2021  | November 2021  | December 2021
Seafood Fish, Wine  | 3399  | 1918  | 1717  | 1549  | 1421  | 1333  | 1291  | 1247  | 1208  | 1173  | 1149  | 112940
Penne Di Maurizio, Wine  | 181738  | 9730  | 8599  | 8181  | 7999  | 7720  | 7577  | 7366  | 7234  | 7170  | 7119  | 7083
Duck, Seafood Fish  | 396251  | 13252  | 12476  | 11872  | 11111  | 10648  | 10467  | 10075  | 9844  | 9625  | 9421  | 9323
Linda Bertolli, Olive oil Moles  | 312883  | 9101  | 8744  | 8331  | 7950  | 7783  | 7573  | 7434  | 7273  | 7209  | 7097  | 7056
Marzano, Olive oil Moles  | 376800  | 8568  | 8247  | 8052  | 7886  | 7722  | 7541  | 7392  | 7284  | 7202  | 7173  | 7145
Buitoni, Pasta  | 628853  | 7112  | 6886  | 6730  | 6538  | 6357  | 6254  | 6134  | 6079  | 6029  | 5963  | 5894

These are insights derived from Data Table about product bundles:
product bundle trend  | November 2021/December 2021  | Seafood Fish, Wine  | 9829%

high quality natural language descriptions of insights:

Yeah, it seems to be more useful for text generation than anything else. I can see how it can be useful for building a bot that answers questions based on a high-level report but not for insight generation.

I’ve also noticed that the model can handle basic operations like summing up the total revenue but really suffers for extracting information a bit more advanced than that, like understanding sudden drops, spikes, and other more sophisticated analyses.

Do you know good models for this use case?

1 Like

you can make transformer models do whatever you want. fine tune gpt2 or gpt3 or bert on data tables and insights. or build a transformer from scratch. keep working the prompt.

you can have the prompt accept a user language query, convert it to sql, get the data from sql results and summarize in english. you’ll need multiple calls to do this. lots of folks are doing it this way.

but it’s not computationally efficient to do what you’re trying to do in one model with one shot meta learning

language is a lossy signal. it can trace out possibilities but whether those possibilities are “insights” or “facts” requires different modalities.

mathematics, stats and code occupy a different ontological frame in how they work, what work they do and how audiences interpret them.

2 Likes

this series of essays / exercises may help you think about how to get more out of these particular transformers (GPT3) and corpii.

https://un1crom.medium.com/gpt3-linguistics-101-a-multi-part-series-this-is-part-1-on-structure-a41af3a77353

3 Likes