Performance Comparison of GPT Models: An Informal Analysis
TLDR: Here’s the graph.
Introduction
In this analysis, I compare the performance of three different GPT models: gpt-35-turbo-0125
, gpt-4o-2024-05-13
, and gpt-4-turbo-2024-04-09
. My focus is on understanding the tokens per second each model can produce, which serves as a metric for their efficiency and speed. By examining the descriptive statistics and visualizing the data, I aim to determine which model is the fastest and whether the gpt-4o-2024-05-13
offers significant improvements over its predecessor, gpt-4-turbo-2024-04-09
.
Findings
The data collected includes latency in milliseconds and tokens generated for various essay prompts. From this data, I calculated the tokens per second for each model. Here are the descriptive statistics for each model:
Comparative Statistics Table
Statistic | gpt-35-turbo-0125 | gpt-4o-2024-05-13 | gpt-4-turbo-2024-04-09 |
---|---|---|---|
Count | 8 | 8 | 8 |
Mean (tokens/sec) | 67.83 | 63.32 | 35.68 |
Standard Deviation | 11.61 | 14.49 | 3.31 |
Minimum | 42.87 | 35.67 | 31.69 |
25th Percentile | 64.16 | 56.87 | 33.25 |
Median | 71.73 | 65.54 | 35.13 |
75th Percentile | 75.90 | 72.45 | 37.44 |
Maximum | 77.05 | 79.87 | 40.94 |
Analysis
Performance
The gpt-35-turbo-0125
model has the highest mean tokens per second (67.83), indicating it is the fastest among the three models tested. This is followed by gpt-4o-2024-05-13
with a mean of 63.32 tokens per second. The gpt-4-turbo-2024-04-09
model lags behind with a significantly lower mean of 35.68 tokens per second.
Consistency
The standard deviation of tokens per second helps understand the variability in performance. The gpt-4-turbo-2024-04-09
model has the lowest standard deviation (3.31), suggesting consistent performance but at a slower rate. The gpt-35-turbo-0125
model has a moderate standard deviation (11.61), indicating relatively consistent performance with high speed. The gpt-4o-2024-05-13
model, while faster than its predecessor, has the highest standard deviation (14.49), indicating more variability in performance.
Effective Speed Comparison
While gpt-4o-2024-05-13
is not as fast as gpt-35-turbo-0125
, it demonstrates a significant improvement over gpt-4-turbo-2024-04-09
. The mean tokens per second of gpt-4o-2024-05-13
(63.32) is almost double that of gpt-4-turbo-2024-04-09
(35.68), validating that gpt-4o-2024-05-13
is effectively faster and more efficient compared to its predecessor. Most importantly, gpt-4o-2024-05-13
offers a performance level close to that of gpt-35-turbo-0125
, despite having a higher intelligence level. This makes gpt-4o-2024-05-13
a significant upgrade, effectively matching the speed of gpt-35-turbo-0125
while providing advanced capabilities.
Conclusion
My analysis shows that gpt-35-turbo-0125
is the fastest model in terms of tokens per second, making it the most efficient for generating large volumes of text quickly. However, gpt-4o-2024-05-13
is a significant improvement over gpt-4-turbo-2024-04-09
, offering nearly the same level of performance as gpt-35-turbo-0125
with more variability. This suggests that gpt-4o-2024-05-13
is a valuable upgrade, effectively bridging the gap between the older and newer models by matching the speed of gpt-35-turbo-0125
while providing enhanced intelligence.
Appendix: Original Data
Prompt Description | Model | Latency (ms) | Tokens | Tokens per Second |
---|---|---|---|---|
Write a 20-paragraph essay on cars | gpt-35-turbo-0125 | 21015 | 901 | 42.87 |
gpt-4o-2024-05-13 | 20919 | 1476 | 70.56 | |
Write a 20-paragraph essay on birds | gpt-35-turbo-0125 | 15473 | 1004 | 64.89 |
gpt-4o-2024-05-13 | 43814 | 1563 | 35.67 | |
Write a 20-paragraph essay on birds | gpt-35-turbo-0125 | 18126 | 1123 | 61.96 |
gpt-4o-2024-05-13 | 28712 | 1668 | 58.09 | |
Write a 20-paragraph essay on dogs | gpt-35-turbo-0125 | 14535 | 1120 | 77.05 |
gpt-4o-2024-05-13 | 21612 | 1688 | 78.10 | |
Write a 20-paragraph essay on cats | gpt-35-turbo-0125 | 15355 | 1161 | 75.61 |
gpt-4o-2024-05-13 | 20971 | 1444 | 68.86 | |
Write a 2000-word essay on planes | gpt-35-turbo-0125 | 14744 | 1132 | 76.78 |
gpt-4o-2024-05-13 | 32842 | 1748 | 53.22 | |
Write a 2000-word essay on trucks | gpt-35-turbo-0125 | 24725 | 1692 | 68.43 |
gpt-4o-2024-05-13 | 16565 | 1323 | 79.87 | |
Write a 2000-word essay on roads | gpt-35-turbo-0125 | 11863 | 890 | 75.02 |
gpt-4o-2024-05-13 | 28821 | 1793 | 62.21 | |
Write a 2000-word essay on roads | gpt-4-turbo-2024-04-09 | 26558 | 890 | 33.54 |
gpt-4-turbo-2024-04-09 | 23486 | 925 | 39.39 | |
Write a 20-paragraph essay on cars | gpt-4-turbo-2024-04-09 | 34428 | 1138 | 33.05 |
gpt-4-turbo-2024-04-09 | 35809 | 1193 | 33.41 | |
Write a 20-paragraph essay on dogs | gpt-4-turbo-2024-04-09 | 33506 | 1231 | 36.73 |
gpt-4-turbo-2024-04-09 | 31524 | 999 | 31.69 | |
Write a 20-paragraph essay on cats | gpt-4-turbo-2024-04-09 | 24770 | 1014 | 40.94 |
gpt-4-turbo-2024-04-09 | 30743 | 1131 | 36.78 |