The reason behind the “strawberry” mistake was that it grouped the double “rr” and counted them as one. This type of grouping error also occurs in business forecasting. When analyzing patterns in time series data, you might encounter consecutive data points that are similar and mistakenly treat them as a single trend, instead of recognizing them as separate but related occurrences.
I tested this with sales data from a retail store over a 10-day period, including weekends and a special promotion. Based on the weekend sales, I asked: What is the average weekend sales? and Can you predict the estimated total sales after 10 weekends?
The key point is that one Sunday included a promotion, which should not be grouped in the weekend average calculation as it would skew the forecast. However, the model did not exclude the promotion, and it calculated the average including this outlier, leading to an inaccurate forecast.
Here is the dataset:
Day | Date | Sales (in $) | Event |
---|---|---|---|
Day 1 | 01-Sep | 1,000 | Regular Day |
Day 2 | 02-Sep | 1,200 | Regular Day |
Day 3 | 03-Sep | 1,150 | Regular Day |
Day 4 | 04-Sep | 1,800 | Weekend |
Day 5 | 05-Sep | 2,000 | Weekend + Promotion |
Day 6 | 06-Sep | 1,050 | Regular Day |
Day 7 | 07-Sep | 1,100 | Regular Day |
Day 8 | 08-Sep | 1,850 | Weekend |
Day 9 | 09-Sep | 1,900 | Weekend |
Day 10 | 10-Sep | 1,050 | Regular |