Due: Friday 29 May 2026 at 7 PM (NZ time)
data = read_csv("ionq.csv") |>
mutate(`Trading Day` = rev(row_number())) |>
as_tsibble(index = `Trading Day`)
data |>
autoplot(Price) +
ylab("Closing Price (USD)") +
theme_bw()
train = data |>
filter(`Trading Day` < 242)
Model 1: ETS(A, N, N) This model uses additive errors with no trend or seasonality. This could be appropriate because the plot doesn’t seem to have any obvious trend or seasonality and stock prices are prone to be unpredictable in the short term.
Model 2: EST(A, A, N) This model uses additive errors with an additive trend, but no seasonality. This could be appropriate because although there doesn’t appear to be much of a trend, there could be a slight upward drift that might be worth accounting for. I still belive ther is no seasonality.
stock_models <- train |>
model(ANN = ETS(Price ~ error("A") + trend("N") + season("N")),
AAN = ETS(Price ~ error("A") + trend("A") + season("N")),
AutoETS = ETS(Price))
stock_models |>
glance() |>
select(.model, AICc) |>
arrange(AICc)
## # A tibble: 3 × 2
## .model AICc
## <chr> <dbl>
## 1 AutoETS 1802.
## 2 ANN 1817.
## 3 AAN 1821.
stock_models |>
select(AutoETS) |>
report()
## Series: Price
## Model: ETS(M,N,N)
## Smoothing parameters:
## alpha = 0.9998986
##
## Initial states:
## l[0]
## 29.03164
##
## sigma^2: 0.0037
##
## AIC AICc BIC
## 1801.559 1801.660 1812.013
The model with the lowest AICc is the automatically fit ETS. This model uses multiplicative error with no trend or seasonality (ETS(M, N, N)) and has an AICc of about 19 lower than the next best model, demonstrating a significant improvement. It has an alpha value of 0.9999 which tells us that it makes predictions based on a very short term memory of past levels. We have no beta of gamma values because we have no trend or seasonal component.
train |>
features(Price, unitroot_kpss)
## # A tibble: 1 × 2
## kpss_stat kpss_pvalue
## <dbl> <dbl>
## 1 0.882 0.01
diff_stocks <- train |>
mutate(diff_close = difference(Price)) |>
drop_na()
diff_stocks |>
autoplot(diff_close) +
labs(y = "Differenced Closing Price", x = "Trading Day") +
theme_bw()
diff_stocks |>
features(diff_close, unitroot_kpss)
## # A tibble: 1 × 2
## kpss_stat kpss_pvalue
## <dbl> <dbl>
## 1 0.112 0.1
The KPSS test on our raw data gave a significant p-value of 0.01, meaning we reject the null hypothesis of the data having a stationary distribution and have evidence that the stock price series should be differenced to become stationary.
The plot of our differenced stock price series appears to have constant mean and variance, and therefore be stationary. This is supported by a follow-up KPSS test, giving a non-significant p-value of 0.1, suggesting we do not have evidence that the differenced data is non-stationary.
diff_stocks |>
gg_tsdisplay(diff_close, plot_type = "partial")
Model 1: ARIMA(2, 1, 0) Both ACF and PACF plots show a slightly significant value at lag 2. For my first model I will select p = 2 for my AR term to account for this. Having done one first order differenceing (d = 1), this gives the model ARIMA(2, 1, 0).
Model 2: ARIMA(0, 1, 2) Next I will account for the significant ACF value at lag 2 and set q = 2 for my MA component. Again, having done one first order differencing (d = 1), this gives a model of ARIMA(0, 1, 2).
arima_models <- train |>
model(ARIMA210 = ARIMA(Price ~ pdq(2,1,0)),
ARIMA012 = ARIMA(Price ~ pdq(0,1,2)),
AutoARIMA = ARIMA(Price, stepwise = FALSE))
arima_models |>
glance() |>
select(.model, AICc) |>
arrange(AICc)
## # A tibble: 3 × 2
## .model AICc
## <chr> <dbl>
## 1 AutoARIMA 1170.
## 2 ARIMA012 1170.
## 3 ARIMA210 1171.
arima_models |>
select(AutoARIMA) |>
report()
## Series: Price
## Model: ARIMA(0,1,3)
##
## Coefficients:
## ma1 ma2 ma3
## 0.0207 0.1391 -0.0961
## s.e. 0.0642 0.0637 0.0661
##
## sigma^2 estimated as 7.517: log likelihood=-581.13
## AIC=1170.26 AICc=1170.43 BIC=1184.19
The model with the lowest AICc is the automatically fit ARIMA which uses no AR terms, 1 first order differencing, and 3 MA terms (ARIMA(0, 1, 3)). All three models performed very similarly, however, with an AICc difference of less than one between the best and worst models.
The backshift notation for this model is: \[(1 - B)y_t = (1 + \theta_1 B + \theta_2 B^2 + \theta_3 B^3)\varepsilon_t\]
best_models <- train |>
model(ETS = ETS(Price ~ error("M") + trend("N") + season("N")),
ARIMA = ARIMA(Price ~ pdq(0,1,3)))
fc <- best_models |>
forecast(h = 10)
data |>
autoplot(Price) +
autolayer(fc, level = NULL) +
geom_vline(xintercept = 242, linetype = "dashed") +
labs(y = "Closing Price (USD)", x = "Trading Day")
accuracy_results <- fc |>
accuracy(data, measures = list(
RMSE = RMSE,
MAE = MAE,
MAPE = MAPE,
MASE = MASE))
accuracy_results
## # A tibble: 2 × 6
## .model .type RMSE MAE MAPE MASE
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 ARIMA Test 3.43 3.03 6.73 1.51
## 2 ETS Test 3.61 3.23 7.21 1.61
When forecasting 10 periods ahead, our optimal ARIMA model (ARIMA(0, 1, 3)) performs slightly better on the test set than our optimal ETS model (ETS(M, N, N)) with slightly smaller RMSE, MAE, MAPE, and MASE. Therefore, we can conclude that ARIMA(0, 1, 3) is likely to have better forecast accuracy.
data <- read_csv("auckland_sunshine_hours.csv") |>
mutate(Date = yearmonth(paste(Year, Month))) |>
as_tsibble(index = Date)
data |>
autoplot(Sunshine) +
theme_bw()
daily_sunshine <- data |>
mutate(Month = yearmonth(Month), SunshinePerDay = Sunshine / days_in_month(as.Date(Month)))
daily_sunshine |>
autoplot(SunshinePerDay) +
labs(y = "Hours of Sunshine Per Day", x = "Month")
A calender adjustment is important here because different months have different numbers of days which affects the total hours of sunlight. Adjusting for this by by using sunlight per day allows us to compare like with like and get a better idea of how sunlight durations change month to month.
Model 1: ETS(A, N, A) This model uses additive error and seasonality with no trend. This model could be appropriate because the variation appears consistent throughout the time series and there is clear seasonality, but not much of a trend. This tracks with what we would expect from hours of sunlight per day over time as we would think it would change with the seasons but stay consistent year to year.
Model 2: ETS(A, A, A) This model uses additive error, trend, and seasonality which could be appropriate because the variation appears consistent over time, we have strong seasonality, and there could be a slight downward trend, especially toward the end of the series. This would be slightly unexpected as I wouldn’t expect sunlight per day to decrease year on year, but maybe this will give some interesting insight into a larger pattern.
sunshine_models <- daily_sunshine |>
model(AAA = ETS(SunshinePerDay ~ error("A") + trend("N") + season("A")),
ANA = ETS(SunshinePerDay ~ error("A") + trend("A") + season("A")),
AutoETS = ETS(SunshinePerDay)
)
sunshine_models |>
glance() |>
select(.model, AICc) |>
arrange(AICc)
## # A tibble: 3 × 2
## .model AICc
## <chr> <dbl>
## 1 AutoETS 1909.
## 2 AAA 1930.
## 3 ANA 1940.
sunshine_models |>
select(AutoETS) |>
report()
## Series: SunshinePerDay
## Model: ETS(M,N,M)
## Smoothing parameters:
## alpha = 0.030627
## gamma = 0.000100009
##
## Initial states:
## l[0] s[0] s[-1] s[-2] s[-3] s[-4] s[-5] s[-6]
## 5.832097 0.8288384 0.7937119 0.6674427 0.8091494 0.9599796 1.040197 1.280923
## s[-7] s[-8] s[-9] s[-10] s[-11]
## 1.31534 1.269379 1.116678 1.033454 0.8849074
##
## sigma^2: 0.0249
##
## AIC AICc BIC
## 1907.328 1908.814 1964.718
The model with the lowest AICc was the automatically fit ETS, showing a significant improvement over the others. This model used ETS(M, N, M), meaning it uses multiplicative errors and seasonality, but no trend component. It has a small alpha value of 0.03, telling us that the predicted level has a long memory of past observations and stays relatively consistent over time. Similarly, we have a gamma value of 0.0001, meaning that the predicted seasonal component also has a long memory and will change slowly through the time series. Finally, we have no beta value because there is no trend component in the model.
daily_sunshine |>
features(SunshinePerDay, guerrero)
## # A tibble: 1 × 1
## lambda_guerrero
## <dbl>
## 1 0.901
daily_sunshine |>
features(SunshinePerDay, feat_stl) |>
select(seasonal_strength_year)
## # A tibble: 1 × 1
## seasonal_strength_year
## <dbl>
## 1 0.724
diff_data <- daily_sunshine |>
mutate(season_diff = difference(SunshinePerDay, lag = 12))
diff_data |>
features(season_diff, feat_stl) |>
select(seasonal_strength_year)
## # A tibble: 1 × 1
## seasonal_strength_year
## <dbl>
## 1 0.0118
diff_data |>
features(season_diff, unitroot_nsdiffs)
## # A tibble: 1 × 1
## nsdiffs
## <int>
## 1 0
diff_data |>
autoplot(season_diff) +
labs(
title = "Seasonally Differenced Sunshine Series",
y = "Seasonally Differenced Sunshine Hours",
x = "Month"
)
## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_line()`).
diff_data |>
features(season_diff, unitroot_kpss)
## # A tibble: 1 × 2
## kpss_stat kpss_pvalue
## <dbl> <dbl>
## 1 0.0328 0.1
diff_data |>
features(season_diff, unitroot_ndiffs)
## # A tibble: 1 × 1
## ndiffs
## <int>
## 1 0
diff_data |>
gg_tsdisplay(season_diff, plot_type = "partial", lag_max = 48)
## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_point()`).
Model 1: ARIMA(0, 0, 0)(4, 1, 0) We do not have any first order differencing or significant ACF/PACF values in the first 12 lags, so I used 0’s for all non-seasonal ARIMA parameters (p=0, d=0, q=0). We did do one seasonal difference (D = 1) and we see 4 significant PACF values at each 12 month lag, making it appropriate to set our seasonal AR component to 4 (P = 4).
Model 2: ARIMA(0, 0, 0)(0, 1, 2) We still do not have any first order differencing or significant ACF/PACF values in the first 12 lags, so I used 0’s again for all non-seasonal ARIMA parameters (p=0, d=0, q=0). And again we did do one seasonal difference (D = 1) but this time we’ll examine the 2 significant ACF values at lags 12 and 24. To do this we will set the seasonal MA component to 2 (Q = 2).
arima_models <- daily_sunshine |>
model(ARIMA410 = ARIMA(SunshinePerDay ~ pdq(0,0,0) + PDQ(4,1,0)),
ARIMA012 = ARIMA(SunshinePerDay ~ pdq(0,0,0) + PDQ(0,1,2)),
AutoARIMA = ARIMA(SunshinePerDay))
arima_models |>
glance() |>
select(.model, AICc) |>
arrange()
## # A tibble: 3 × 2
## .model AICc
## <chr> <dbl>
## 1 ARIMA410 939.
## 2 ARIMA012 955.
## 3 AutoARIMA 1015.
arima_models |>
select(ARIMA410) |>
report()
## Series: SunshinePerDay
## Model: ARIMA(0,0,0)(4,1,0)[12] w/ drift
##
## Coefficients:
## sar1 sar2 sar3 sar4 constant
## -0.7205 -0.6269 -0.3792 -0.1597 -0.0728
## s.e. 0.0558 0.0655 0.0658 0.0572 0.0572
##
## sigma^2 estimated as 0.982: log likelihood=-463.55
## AIC=939.11 AICc=939.37 BIC=961.85
The model with the lowest AICc is ARIMA(0, 0, 0)(4, 1, 0). This model uses one seasonaly differencing and 4 seasonal AR components, performing significantly better than the other two. In this case the automatically fit model performed the worst.
Backshift notation: \[(1-\Phi_1 B^{12}-\Phi_2 B^{24} - \Phi_3 B^{36}-\Phi_4 B^{48})(1-B^{12})y_t = \varepsilon_t\] > 7.
cv_results <- daily_sunshine |>
slide_tsibble(.size = 200, .step = 1) |>
model(ETS = ETS(SunshinePerDay ~ error("M") + trend("N") + season("M")),
ARIMA = ARIMA(SunshinePerDay ~ 1 + pdq(0,0,0) + PDQ(4,1,0))) |>
forecast(h = 1)
cv_accuracy <- cv_results |>
accuracy(daily_sunshine, measures = list(
RMSE = RMSE,
MAE = MAE,
MAPE = MAPE,
MASE = MASE))
## Warning: The future dataset is incomplete, incomplete out-of-sample data will be treated as missing.
## 1 observation is missing at 1990 Dec
cv_accuracy
## # A tibble: 2 × 6
## .model .type RMSE MAE MAPE MASE
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 ARIMA Test 1.03 0.771 15.5 0.780
## 2 ETS Test 0.957 0.753 15.1 0.762
The ETS(M, N, M) model performs better on all CV metrics than the ARIMA(0, 0, 0)(4, 1, 0) model. From this we can conclude that ETS is likely to have a better out-of-sample forecast accuracy.