I have a dataset with text similar to the one in the example URL. I can create embeddings and add them as a new column in my dataset, just like the ‘search_docs’ function in the example. So far, everything is working well based on this example.
However, I’m currently facing a challenge because I don’t have a big-picture understanding of the text data or know what kind of questions to ask. My objective is to collect all the text from my records into one large text and then identify and classify 15-20 major topics. I want to assign these major topics individually to each row so that all my data falls into 15-20 distinct groups or “buckets”. This task is essentially similar to using K-Means or other clustering techniques.
I would greatly appreciate any assistance or guidance in achieving this goal. Please note that I am using Azure OpenAI and Python for this project.
The URL I mentioned above: