I am trying to use the GPT Api to make a retrieval RAG system to analyse excel sheets of numeric data. It is difficult to handle the numeric data to embedding using the pandas library as most of the context and semantic information is lost while reading them into a Dataframe and chunking them. Is there any good technique or practises about how to develop a RAG system properly to query and analyse the excel data?
Will you use a single dedicated Excel file as the source for RAG, or will you submit a separate Excel file for analysis with each conversation?
Hi, thank you for the response. I shall be using a single excel file at the moment to upload into the RAG system (and might/will scale up later if it works for multiple files).
One approach is to store the Excel file as it is or, if the data is structured, create an DB.
Based on query, You can then use a language model to generate queries for filtering and analyzing the data.
Most cases Tools are great for querying and producing the final response.
Yup, I thought about it, but the Excel sheet data is not perfect, in the sense that it is visually appealing with colours and headings in the Excel but not as good to convert it to a DB. Will I need to manually change the excel file properly everytime so that a sematic/schema is maintained for the DB to understand?
How’s your file looks like?
MAPPING | ANALYSIS | ANALYSIS | ||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Total Companies | Visible Scale-ups | Scale-up Potential SMEs | Growth Strategy Priorities | |||||||||||||||||||||||||
Financial Services | Professional Services | Business Services | Segment Population | % Total Population | Financial Services | Professional Services | Business Services | Segment Total | % Borough | % Total Population | % VSU Population | Financial Services | Professional Services | Business Services | Segment Total | % Borough | % Total Population | % SUP Population | Financial Services | Professional Services | Business Services | |||||||
CIS 64-66 | CIS 69-74 | CIS 77-78, 82 | CIS 64-66 | CIS 69-74 | CIS 77-78, 82 | CIS 64-66 | CIS 69-74 | CIS 77-78, 82 | Market Penetration | Market Development | Product Development | Market Penetration | Market Development | Product Development | Market Penetration | Market Development | Product Development | |||||||||||
Region A | 939 | 2370 | 1833 | 5142 | 41.0% | 24 | 72 | 89 | 185 | 3.6% | 1.5% | 39.1% | 242 | 664 | 507 | 1413 | 27.5% | 11.3% | 40.6% | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Large | 13 | 9 | 8 | 30 | 0 | 2 | 1 | 3 | ||||||||||||||||||||
SME | 926 | 2361 | 1825 | 5112 | 24 | 70 | 88 | 182 | 242 | 664 | 507 | 1413 | 27.6% | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||||||
Phases of Growth (SMEs) | ||||||||||||||||||||||||||||
Start-up Businesses (0 to 5 years) | 598 | 1393 | 898 | 2889 | 5 | 14 | 6 | 25 | 105 | 190 | 97 | 392 | 7.6% | 3.1% | 11.3% | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||||
High Growth Potential Entrepreneurial Enterprises (5 to 10 years) | 149 | 542 | 487 | 1178 | 9 | 33 | 56 | 98 | 71 | 281 | 188 | 892 | 17.3% | 7.1% | 25.6% | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||||
Scaleup Potential Businesses (10 to 20 years) | 87 | 295 | 333 | 715 | 5 | 20 | 24 | 49 | 40 | 137 | 175 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||||||||
Scaleup Businesses (20 to 50 years) | 81 | 124 | 93 | 298 | 5 | 3 | 2 | 10 | 21 | 55 | 40 | 129 | 2.5% | 1.0% | 3.7% | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||||
Repeat Scaleup Businesses (50 years plus) | 11 | 7 | 14 | 32 | 0 | 0 | 0 | 0 | 5 | 1 | 7 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||||||||
Region B | 368 | 934 | 793 | 2095 | 16.7% | 15 | 32 | 34 | 81 | 3.9% | 0.6% | 17.1% | 128 | 230 | 242 | 600 | 28.6% | 4.8% | 17.2% | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Large | 4 | 1 | 4 | 9 | 0 | 1 | 0 | 1 | ||||||||||||||||||||
SME | 364 | 933 | 789 | 2091 | 15 | 31 | 34 | 80 | 128 | 230 | 242 | 600 | 28.7% | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||||||
Phases of Growth | ||||||||||||||||||||||||||||
Start-up Businesses (0 to 5 years) | 204 | 463 | 368 | 1035 | 2 | 4 | 2 | 8 | 40 | 42 | 45 | 127 | 6.1% | 1.0% | 3.6% | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||||
High Growth Potential Entrepreneurial Enterprises (5 to 10 years) | 77 | 192 | 181 | 450 | 9 | 15 | 20 | 44 | 43 | 76 | 76 | 405 | 19.3% | 3.2% | 11.6% | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||||
Scaleup Potential Businesses (10 to 20 years) | 66 | 206 | 170 | 442 | 4 | 11 | 11 | 26 | 33 | 88 | 89 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||||||||
Scaleup Businesses (20 to 50 years) | 17 | 62 | 63 | 142 | 0 | 1 | 1 | 2 | 9 | 24 | 30 | 68 | 3.2% | 0.5% | 2.0% | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||||
Repeat Scaleup Businesses (50 years plus) | 5 | 10 | 7 | 22 | 0 | 0 | 0 | 0 | 3 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||||||||
Region C | 347 | 1425 | 1040 | 2812 | 22.4% | 12 | 44 | 41 | 97 | 3.4% | 0.8% | 20.5% | 108 | 362 | 329 | 799 | 28.4% | 6.4% | 22.9% | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Large | 3 | 1 | 7 | 11 | 1 | 0 | 0 | 1 | ||||||||||||||||||||
SME | 344 | 1424 | 1033 | 2801 | 11 | 44 | 41 | 96 | 108 | 362 | 329 | 799 | 28.5% | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||||||
Phases of Growth | ||||||||||||||||||||||||||||
Start-up Businesses (0 to 5 years) | 205 | 748 | 371 | 1324 | 1 | 12 | 7 | 20 | 37 | 63 | 38 | 138 | 4.9% | 1.1% | 4.0% | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||||
High Growth Potential Entrepreneurial Enterprises (5 to 10 years) | 58 | 277 | 286 | 621 | 6 | 24 | 20 | 50 | 23 | 116 | 120 | 557 | 19.8% | 4.4% | 16.0% | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||||
Scaleup Potential Businesses (10 to 20 years) | 54 | 292 | 307 | 653 | 2 | 8 | 10 | 20 | 33 | 129 | 136 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||||||||
Scaleup Businesses (20 to 50 years) | 21 | 101 | 65 | 187 | 2 | 0 | 4 | 6 | 11 | 52 | 32 | 104 | 3.7% | 0.8% | 3.0% | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||||
Repeat Scaleup Businesses (50 years plus) | 6 | 6 | 4 | 16 | 0 | 0 | 0 | 0 | 4 | 2 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||||||||
Region D | 186 | 462 | 384 | 1032 | 8.2% | 5 | 24 | 14 | 43 | 4.2% | 0.3% | 9.1% | 59 | 121 | 93 | 273 | 26.5% | 2.2% | 7.8% | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Large | 0 | 3 | 6 | 9 | 0 | 1 | 0 | 1 | ||||||||||||||||||||
SME | 186 | 459 | 378 | 1023 | 5 | 23 | 14 | 42 | 59 | 121 | 93 | 273 | 26.7% | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||||||
Phases of Growth | ||||||||||||||||||||||||||||
Start-up Businesses (0 to 5 years) | 110 | 239 | 205 | 554 | 2 | 5 | 2 | 9 | 18 | 33 | 18 | 69 | 6.7% | 0.6% | 2.0% | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||||
High Growth Potential Entrepreneurial Enterprises (5 to 10 years) | 32 | 94 | 79 | 205 | 0 | 11 | 8 | 19 | 17 | 34 | 34 | 168 | 16.3% | 1.3% | 4.8% | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||||
Scaleup Potential Businesses (10 to 20 years) | 35 | 78 | 64 | 177 | 2 | 5 | 2 | 9 | 20 | 36 | 27 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||||||||
Scaleup Businesses (20 to 50 years) | 8 | 43 | 29 | 80 | 1 | 2 | 2 | 5 | 4 | 18 | 13 | 36 | 3.5% | 0.3% | 1.0% | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||||
Repeat Scaleup Businesses (50 years plus) | 1 | 5 | 1 | 7 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||||||||
Region E | 111 | 274 | 190 | 575 | 4.6% | 4 | 9 | 7 | 20 | 3.5% | 0.2% | 4.2% | 31 | 71 | 51 | 153 | 26.6% | 1.2% | 4.4% | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Large | 0 | 1 | 1 | 2 | 0 | 0 | 0 | 0 | ||||||||||||||||||||
SME | 111 | 273 | 189 | 573 | 4 | 9 | 7 | 20 | 31 | 71 | 51 | 153 | 26.7% | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||||||
Phases of Growth | ||||||||||||||||||||||||||||
Start-up Businesses (0 to 5 years) | 64 | 150 | 93 | 307 | 1 | 4 | 0 | 5 | 13 | 8 | 11 | 32 | 5.6% | 0.3% | 0.9% | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||||
High Growth Potential Entrepreneurial Enterprises (5 to 10 years) | 24 | 50 | 43 | 117 | 3 | 2 | 6 | 11 | 13 | 29 | 16 | 101 | 17.6% | 0.8% | 2.9% | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||||
Scaleup Potential Businesses (10 to 20 years) | 16 | 51 | 34 | 101 | 0 | 2 | 0 | 2 | 5 | 26 | 12 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||||||||
Scaleup Businesses (20 to 50 years) | 6 | 21 | 18 | 45 | 0 | 1 | 1 | 2 | 0 | 7 | 11 | 20 | 3.5% | 0.2% | 0.6% | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||||
Repeat Scaleup Businesses (50 years plus) | 1 | 1 | 1 | 3 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||||||||
Region F | 160 | 463 | 266 | 889 | 7.1% | 8 | 23 | 16 | 47 | 5.3% | 0.4% | 9.9% | 55 | 114 | 76 | 245 | 27.6% | 2.0% | 7.0% | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Large | 5 | 1 | 0 | 6 | 0 | 0 | 0 | 0 | ||||||||||||||||||||
SME | 155 | 462 | 266 | 883 | 8 | 23 | 16 | 47 | 55 | 114 | 76 | 245 | 27.7% | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||||||
Phases of Growth | ||||||||||||||||||||||||||||
Start-up Businesses (0 to 5 years) | 82 | 214 | 110 | 406 | 1 | 3 | 1 | 5 | 24 | 16 | 10 | 50 | 5.6% | 0.4% | 1.4% | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||||
High Growth Potential Entrepreneurial Enterprises (5 to 10 years) | 42 | 96 | 71 | 209 | 4 | 11 | 7 | 22 | 16 | 37 | 23 | 162 | 18.2% | 1.3% | 4.7% | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||||
Scaleup Potential Businesses (10 to 20 years) | 21 | 106 | 59 | 186 | 2 | 7 | 6 | 15 | 9 | 46 | 31 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||||||||
Scaleup Businesses (20 to 50 years) | 5 | 45 | 26 | 76 | 1 | 2 | 2 | 5 | 2 | 15 | 12 | 33 | 3.7% | 0.3% | 0.9% | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||||
Repeat Scaleup Businesses (50 years plus) | 5 | 1 | 0 | 6 | 0 | 0 | 0 | 0 | 4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||||||||
Region G | 2111 | 5928 | 4506 | 12545 | 100% | 68 | 204 | 201 | 473 | 3.8% | 3.8% | 100.0% | 623 | 1562 | 1298 | 3483 | 27.8% | 100.0% | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
Large | 25 | 16 | 26 | 67 | 1 | 4 | 1 | 6 | ||||||||||||||||||||
SME | 2086 | 5912 | 4480 | 12483 | 67 | 200 | 200 | 467 | 623 | 1562 | 1298 | 3483 | 27.8% | 100.0% | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |||||
Phases of Growth | ||||||||||||||||||||||||||||
Start-up Businesses (0 to 5 years) | 1263 | 3207 | 2045 | 6515 | 12 | 42 | 18 | 72 | 237 | 352 | 219 | 808 | 6.4% | 23.2% | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |||||
High Growth Potential Entrepreneurial Enterprises (5 to 10 years) | 382 | 1251 | 1147 | 2780 | 31 | 96 | 117 | 244 | 183 | 573 | 457 | 2285 | 18.2% | 65.6% | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |||||
Scaleup Potential Businesses (10 to 20 years) | 279 | 1028 | 967 | 2274 | 15 | 53 | 53 | 121 | 140 | 462 | 470 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||||||||
Scaleup Businesses (20 to 50 years) | 138 | 396 | 294 | 828 | 9 | 9 | 12 | 30 | 47 | 171 | 138 | 390 | 3.1% | 11.2% | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |||||
Repeat Scaleup Businesses (50 years plus) | 29 | 30 | 27 | 86 | 0 | 0 | 0 | 0 | 16 | 4 | 14 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
You can store the Excel file in OpenAI’s file storage or AWS S3 and build an analytics module using Pandas. This can then be integrated with function calling using the OpenAI Assistant API to support retrieval-augmented generation (RAG). I have developed a module to dynamically manage and load plugins for function calling with the OpenAI Assistant API. This module is built on top of the OpenAI Assistant API and the AWS Serverless framework, utilizing AWS API Gateway, AWS Lambda, and AWS DynamoDB. It supports HTTPS interactions and streaming through WebSocket connections.
Currently, I am updating the README to document the recent enhancements to the module. If you’re interested in learning more about its implementation, I’d be happy to share additional details once the README update is complete. You can find the repository here: openai_assistant_engine.