I have a problem and hope someone can help.
I have a fil of 10.000 rows with 2 colums. The name of a company and nummer if employees. And i want Chatgpt to categories them in Small, middle and so.
If i use the interface and gtp 3.5 with one sample Data it works great.
If i do a copy and paste of 50 rows and tell him to do a table it works great.
But i cant copy past more than lets say 100. To work in practise i need automation.
So i wrote a scipt and tried it via API.
If i use all data via API i got an error. So does anyone know how many data i can upload in a jason payload?
Ok so next try was to go trhout it line by line. That worked but had 2 Disadvantages.
A lot of token and run into rate limits soon
The quality is not nearly the same are in the larger copy and paste style
So can anyone help and did the same?
And if not maybe answer these question?
What id the json upload limit on size?
What parameters uses Chgpt 3.5. in the interface?
How can i make sure the Api REALLY give back one word as outbput and not sentences?
For your use-case, Iâd recommend reading docs for embeddings.
EDIT: On re-reading your post, it seems that you need to come-up with your criteria for SML and then you can simply do it in a spreadsheet - as simple as writing a formula.
If all that you need is to classify a company into a few categories like âSmallâ, âMediumâ, and âLargeâ, based on the number of employees, maybe you can ask ChatGPT to write a logical program to do the classification.
Hey Jochen, danke fĂŒr deine Antwort wo genau wĂŒrdest du das [onlyonewordhere] platzieren?
S0?
Im Format: Anzahl Mitarbeiter,GröĂe Unternehmen[onlyonewordhere] ?
Yes, but indeed it want to do larger things. And i want to understand what process is the best for larger files. To do it step-by-step line be line or in total.
@bill.french Wanted to know on the possibility of implementing statistics where I can feed the models large dataset and then query out any required statistic as mentioned in below example:
In below image, I have user visited apps from location and during what timestamp as data which will be huge
Now, I should be able to query average count of visitors within some timestamp, I can query count of visitors visited yesterday for ST1 (consider STName as an application name) and so onâŠ
Stating what you âshouldâ be able to do is a hypothesis that must also factor in practical boundaries.
Time-series data is typically raw and voluminous. Intentionally, IoT signals are collected to ensure real-time perturbations can be detected and corrected. This is especially important for mission-critical processes where a few missed events sometimes indicate a big problem. But time-series data is also valuable for machine learning. Your use case is neither - you are looking for the lazy pathway to analytics. And Iâm fine with that - I love a good lazy approach - itâs how great innovations are made.
IN THIS CASE, the AI âfitâ is a reach (based on my skill set, known approaches, and financial practicalities). Practically speaking, I assess that this is a round hole and giant earth mover problem. Putting a mega earth mover in a small hole has one challenge - physics. AI interfaces (UIs and APIs) are presently limited; theyâre tiny holes. There are indications we will soon see 100k prompt capabilities. But thatâs nothing compared to extremely granular time-series data - at least the volume that would produce valid assessments.
You canât take a slice of the series and expect your analytics to be valid; the entire point of analytics is to factor in lots of data. As such, the only rational pathway I can see is to aggregate and then expose the aggregated data to the AI model in the form of discrete learner prompts.
Aggregation approaches take many forms and I believe there are some clever things that can be done to support an increasingly adept AI process.
This is to say that all the crap we learned about data, aggregations, summations, and analytics â before LLMs became popular â still matters.