Hi Guys, I am planning on releasing a database + NL question corpus that gets exactly at the issues we are discussing.
Would it be useful if I built a public corpus around the following schema:
<table id="stock" writes="full">
<column id="Symbol" type="text"/>
<column id="Name" type="text"/>
<column id="Market_Cap" type="int" lex="Market Cap"/>
<column id="Country" type="text"/>
<column id="IPO_Year" type="int" lex="IPO Year"/>
<column id="Sector" type="text"/>
<column id="Industry" type="text"/>
<column id="exchange" type="text"/>
<primary_key key="Symbol"/>
</table>
<table id="quote" writes="full" lex="quote,quotes">
<column id="time" type="time"/>
<column id="symbol" type="text" properties="none"/>
<column id="price" type="real" units="USD"/>
<column id="volume" type="int"/>
<unique key="time,symbol"/>
<foreign_key from="symbol" to="stock"/>
</table>
The database state will be some subset of the stocks of the NASDAQ and their price at some frequency over the day.
The natural language questions would be like:
“sectors of stocks”
“names of energy stocks”
“names of energy stocks that are down 5% in the last two hours”
Would this be an interesting corpus to try to do a little prompt engineering on? Golly I will eat my hat if
ChatGPT can do this in a robust way.
Let me know if I should proceed.
Michael