Need a way to make bot count the rows in csv file based on what user enters

Hi All,

I am experimenting with a chat bot on csv file that has a list of issues and descriptions. I am using Azure OpenAI and Streamlit. Fields have been converted into vectors.

In my scenario, user can enter a issue description. Chat bot should count the rows that would match this . If there are >= 5 records, return 5. If it is <5, return only what is available. Currently if there are only 2 that matches this, it returns the 2 records and shows 3 records that are unrelated to this.

I would need a solution to count the records when a user enters an issue and accordingly write prompt rules.

Any immediate help would be much appreciated.

1 Like

It seems like you are getting close, however the AI is misunderstanding your request, or not correctly terminating the output when reaching the production of only cases that are true.

With over-emphasis on producing five, the AI may continue to line three without corresponding data, and then have to write something.

Instead, focus heavily on the criteria that must be met in order to either produce another line or to terminate the output. "At the end of every line you produce, think carefully if any other lines in the file meet the criteria, paying close attention to the data in field (x) and the requirements that need to be matched, and if nothing is left to produce, you must terminate the output by producing β€œ(end_of_data)”. Or other stop phrase that works for you, since you can’t directly refer to the stop token.

Only as a side note you can add that if line 5 is produced of results, the output terminates. You can also have the output lines numbered.

3 Likes

Thank you so much!! This helped me!!

1 Like

if i.endswith(β€œ.csv”):
f = open(i)
contents = f.read()
ccount = 0
for β€œ,” in contents:
ccount = ccount + 1
print(ccount)
You would actually make this a function and call it with the return ccount

Chat bot is built using Azure OpenAi and Search on a csv Q & A file. I am doing a vector and embedding as well. User tries to ask a question on how many issues are there for a criteria. My chat bot lists different count for same question. Can chat bot count the rows in csv using an issue id as key column. Can i make it an integer for it to count? Will it return a complete count of all or just within a vector?
Really appreciate your help!

Thank you for your reply ! Would I call this function inside a prompt ? If user asks count for different criteria how will it filter ?

From what I was seeing in the api the response objects are json format but they do show functions. It might be best to put them in there. I have my own api so it would be in the format of def ccount(): Not sure how to translate it to this api. But as you can see that will certainly count the number of entries.

Thank you for your reply!

I tried it. But unfortunately i am not able to achieve it. One issue i see is the value in the below codes restrict the output count

vector_query = VectorizedQuery(vector =embedding, k_nearest_neighbors =25, ……etc)
Results =search_client.search(……, top =25)

Example: here the count is set to 25. If user asks show total issue count it shows 24

If i change the parameter to 50, it shows total issues as 45.

How do i make the bot return a total count irrespective of these parameters and slice it for the values in other fields ( example status field has open & closed). It can show how many issues are open and how many are closed.

Where is the actual .csv file. ccount gets the actual count from the file. It must be in the vector_query. So you would call the function in the response stream. Is it the embedding in the query? That function works on the actual .csv file. If it is you would change the function to count the β€œ,” in the embbeded string. You see how that works? The return is always going to be an integer but it can get it from a file or just a string as long as it is the .csv format.

try this for your function;

import os


def ccount():
# file reading will be done here to formulate the insert
    i = "test.csv"
    f = open(i)
    contents = f.read()
    print(contents)
    ccount = 0
    ccount = sum(1 for row in contents) 
    # divide by 4 for the 4 fields
    ccount = ccount/4
    print(ccount)
    return ccount

ccount()

it should be what you want for the counting function.

This,is,a,test,Only,a,test,

7.0

Now merging that in the api? Call ccount() in the response.

1 Like

Thank you so much! Will try this out!