Hello everybody,
I have two things that I would like to do for a very big dataset in python:
-
lets say I have a pandas df with 3 columns. It should take for each row of the df the data from the first column and then it should return using a static prompt the data which gets put in the two other column. Eg input is “apple” → output for the columns “type” and “color”: “fruit, green” → both values are put in the respective columns.
I would love to fill two columns using one api call but I am not sure what separator would work so that it gets correctly split (eg imaging multiple sentences are the result for both columns and not just one word).
Q: Did anyone find a way to do it for multiple columns and which separator works well? -
I want to not send one individual api call for the case as above, but as a batch so eg 20 calls at the same time. one call contains multiple rows and the response data should get put at the correct row of the df. So the order how the api calls get processed matters in this case. This one I got somewhat to work using concurrent.features in python but its not 100% always working correctly.
Q: did anyone find a way to send a batch of multiple rows to the api where it aksi orders the data correctly back into the df?
For both use-cases I am not 100% how to do it or at least it does not always work. Did anyone already have the same use-cases or tasks for the API and managed to do it successfully? Or do you just send one prompt at a time for each row and column? Thank you very much!