Due: Friday 29 May 2026 at 7 PM (NZ time)

Problem 1: IonQ Stock Prices

data = read_csv("ionq.csv") |>
  mutate(`Trading Day` = rev(row_number())) |>
  as_tsibble(index = `Trading Day`)

data |>
  autoplot(Price) + 
  ylab("Closing Price (USD)") + 
  theme_bw()

train = data |>
  filter(`Trading Day` < 242)

Model 1: ETS(A, N, N) This model uses additive errors with no trend or seasonality. This could be appropriate because the plot doesn’t seem to have any obvious trend or seasonality and stock prices are prone to be unpredictable in the short term.

Model 2: EST(A, A, N) This model uses additive errors with an additive trend, but no seasonality. This could be appropriate because although there doesn’t appear to be much of a trend, there could be a slight upward drift that might be worth accounting for. I still belive ther is no seasonality.

stock_models <- train |>
  model(ANN = ETS(Price ~ error("A") + trend("N") + season("N")),
    AAN = ETS(Price ~ error("A") + trend("A") + season("N")),
    AutoETS = ETS(Price))

stock_models |>
  glance() |>
  select(.model, AICc) |> 
  arrange(AICc)

## # A tibble: 3 × 2
##   .model   AICc
##   <chr>   <dbl>
## 1 AutoETS 1802.
## 2 ANN     1817.
## 3 AAN     1821.

stock_models |>
  select(AutoETS) |> 
  report()

## Series: Price 
## Model: ETS(M,N,N) 
##   Smoothing parameters:
##     alpha = 0.9998986 
## 
##   Initial states:
##      l[0]
##  29.03164
## 
##   sigma^2:  0.0037
## 
##      AIC     AICc      BIC 
## 1801.559 1801.660 1812.013

The model with the lowest AICc is the automatically fit ETS. This model uses multiplicative error with no trend or seasonality (ETS(M, N, N)) and has an AICc of about 19 lower than the next best model, demonstrating a significant improvement. It has an alpha value of 0.9999 which tells us that it makes predictions based on a very short term memory of past levels. We have no beta of gamma values because we have no trend or seasonal component.

train |>
  features(Price, unitroot_kpss)

## # A tibble: 1 × 2
##   kpss_stat kpss_pvalue
##       <dbl>       <dbl>
## 1     0.882        0.01

diff_stocks <- train |>
  mutate(diff_close = difference(Price)) |> 
  drop_na()

diff_stocks |>
  autoplot(diff_close) +
  labs(y = "Differenced Closing Price", x = "Trading Day") +
  theme_bw()

diff_stocks |>
  features(diff_close, unitroot_kpss)

## # A tibble: 1 × 2
##   kpss_stat kpss_pvalue
##       <dbl>       <dbl>
## 1     0.112         0.1

The KPSS test on our raw data gave a significant p-value of 0.01, meaning we reject the null hypothesis of the data having a stationary distribution and have evidence that the stock price series should be differenced to become stationary.

The plot of our differenced stock price series appears to have constant mean and variance, and therefore be stationary. This is supported by a follow-up KPSS test, giving a non-significant p-value of 0.1, suggesting we do not have evidence that the differenced data is non-stationary.

diff_stocks |>
  gg_tsdisplay(diff_close, plot_type = "partial")

Model 1: ARIMA(2, 1, 0) Both ACF and PACF plots show a slightly significant value at lag 2. For my first model I will select p = 2 for my AR term to account for this. Having done one first order differenceing (d = 1), this gives the model ARIMA(2, 1, 0).

Model 2: ARIMA(0, 1, 2) Next I will account for the significant ACF value at lag 2 and set q = 2 for my MA component. Again, having done one first order differencing (d = 1), this gives a model of ARIMA(0, 1, 2).

arima_models <- train |>
  model(ARIMA210 = ARIMA(Price ~ pdq(2,1,0)),
    ARIMA012 = ARIMA(Price ~ pdq(0,1,2)),
    AutoARIMA = ARIMA(Price, stepwise = FALSE))

arima_models |>
  glance() |>
  select(.model, AICc) |> 
  arrange(AICc)

## # A tibble: 3 × 2
##   .model     AICc
##   <chr>     <dbl>
## 1 AutoARIMA 1170.
## 2 ARIMA012  1170.
## 3 ARIMA210  1171.

arima_models |>
  select(AutoARIMA) |> 
  report()

## Series: Price 
## Model: ARIMA(0,1,3) 
## 
## Coefficients:
##          ma1     ma2      ma3
##       0.0207  0.1391  -0.0961
## s.e.  0.0642  0.0637   0.0661
## 
## sigma^2 estimated as 7.517:  log likelihood=-581.13
## AIC=1170.26   AICc=1170.43   BIC=1184.19

The model with the lowest AICc is the automatically fit ARIMA which uses no AR terms, 1 first order differencing, and 3 MA terms (ARIMA(0, 1, 3)). All three models performed very similarly, however, with an AICc difference of less than one between the best and worst models.

The backshift notation for this model is: \[(1 - B)y_t = (1 + \theta_1 B + \theta_2 B^2 + \theta_3 B^3)\varepsilon_t\]

best_models <- train |> 
  model(ETS = ETS(Price ~ error("M") + trend("N") + season("N")),
    ARIMA = ARIMA(Price ~ pdq(0,1,3)))

fc <- best_models |>
  forecast(h = 10)

data |>
  autoplot(Price) +
  autolayer(fc, level = NULL) +
  geom_vline(xintercept = 242, linetype = "dashed") +
  labs(y = "Closing Price (USD)", x = "Trading Day")

accuracy_results <- fc |>
  accuracy(data, measures = list(
      RMSE = RMSE,
      MAE = MAE,
      MAPE = MAPE,
      MASE = MASE))

accuracy_results

## # A tibble: 2 × 6
##   .model .type  RMSE   MAE  MAPE  MASE
##   <chr>  <chr> <dbl> <dbl> <dbl> <dbl>
## 1 ARIMA  Test   3.43  3.03  6.73  1.51
## 2 ETS    Test   3.61  3.23  7.21  1.61

When forecasting 10 periods ahead, our optimal ARIMA model (ARIMA(0, 1, 3)) performs slightly better on the test set than our optimal ETS model (ETS(M, N, N)) with slightly smaller RMSE, MAE, MAPE, and MASE. Therefore, we can conclude that ARIMA(0, 1, 3) is likely to have better forecast accuracy.

Problem 2: Auckland Sunshine Hours

data <- read_csv("auckland_sunshine_hours.csv") |>
  mutate(Date = yearmonth(paste(Year, Month))) |>
  as_tsibble(index = Date)

data |> 
  autoplot(Sunshine) + 
  theme_bw()

daily_sunshine <- data |>
  mutate(Month = yearmonth(Month), SunshinePerDay = Sunshine / days_in_month(as.Date(Month)))

daily_sunshine |>
  autoplot(SunshinePerDay) +
  labs(y = "Hours of Sunshine Per Day", x = "Month")

A calender adjustment is important here because different months have different numbers of days which affects the total hours of sunlight. Adjusting for this by by using sunlight per day allows us to compare like with like and get a better idea of how sunlight durations change month to month.

Model 1: ETS(A, N, A) This model uses additive error and seasonality with no trend. This model could be appropriate because the variation appears consistent throughout the time series and there is clear seasonality, but not much of a trend. This tracks with what we would expect from hours of sunlight per day over time as we would think it would change with the seasons but stay consistent year to year.

Model 2: ETS(A, A, A) This model uses additive error, trend, and seasonality which could be appropriate because the variation appears consistent over time, we have strong seasonality, and there could be a slight downward trend, especially toward the end of the series. This would be slightly unexpected as I wouldn’t expect sunlight per day to decrease year on year, but maybe this will give some interesting insight into a larger pattern.

sunshine_models <- daily_sunshine |>
  model(AAA = ETS(SunshinePerDay ~ error("A") + trend("N") + season("A")),
    ANA = ETS(SunshinePerDay ~ error("A") + trend("A") + season("A")),
    AutoETS = ETS(SunshinePerDay)
  )

sunshine_models |>
  glance() |>
  select(.model, AICc) |> 
  arrange(AICc)

## # A tibble: 3 × 2
##   .model   AICc
##   <chr>   <dbl>
## 1 AutoETS 1909.
## 2 AAA     1930.
## 3 ANA     1940.

sunshine_models |> 
  select(AutoETS) |> 
  report()

## Series: SunshinePerDay 
## Model: ETS(M,N,M) 
##   Smoothing parameters:
##     alpha = 0.030627 
##     gamma = 0.000100009 
## 
##   Initial states:
##      l[0]      s[0]     s[-1]     s[-2]     s[-3]     s[-4]    s[-5]    s[-6]
##  5.832097 0.8288384 0.7937119 0.6674427 0.8091494 0.9599796 1.040197 1.280923
##    s[-7]    s[-8]    s[-9]   s[-10]    s[-11]
##  1.31534 1.269379 1.116678 1.033454 0.8849074
## 
##   sigma^2:  0.0249
## 
##      AIC     AICc      BIC 
## 1907.328 1908.814 1964.718

The model with the lowest AICc was the automatically fit ETS, showing a significant improvement over the others. This model used ETS(M, N, M), meaning it uses multiplicative errors and seasonality, but no trend component. It has a small alpha value of 0.03, telling us that the predicted level has a long memory of past observations and stays relatively consistent over time. Similarly, we have a gamma value of 0.0001, meaning that the predicted seasonal component also has a long memory and will change slowly through the time series. Finally, we have no beta value because there is no trend component in the model.

daily_sunshine |>
  features(SunshinePerDay, guerrero)

## # A tibble: 1 × 1
##   lambda_guerrero
##             <dbl>
## 1           0.901

daily_sunshine |>
  features(SunshinePerDay, feat_stl) |> 
  select(seasonal_strength_year)

## # A tibble: 1 × 1
##   seasonal_strength_year
##                    <dbl>
## 1                  0.724

diff_data <- daily_sunshine |>
  mutate(season_diff = difference(SunshinePerDay, lag = 12))

diff_data |>
  features(season_diff, feat_stl) |> 
  select(seasonal_strength_year)

## # A tibble: 1 × 1
##   seasonal_strength_year
##                    <dbl>
## 1                 0.0118

diff_data |>
  features(season_diff, unitroot_nsdiffs)

## # A tibble: 1 × 1
##   nsdiffs
##     <int>
## 1       0

diff_data |>
  autoplot(season_diff) +
  labs(
    title = "Seasonally Differenced Sunshine Series",
    y = "Seasonally Differenced Sunshine Hours",
    x = "Month"
  )

## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_line()`).

diff_data |>
  features(season_diff, unitroot_kpss)

## # A tibble: 1 × 2
##   kpss_stat kpss_pvalue
##       <dbl>       <dbl>
## 1    0.0328         0.1

diff_data |>
  features(season_diff, unitroot_ndiffs)

## # A tibble: 1 × 1
##   ndiffs
##    <int>
## 1      0

A Guerrero lambda value of 0.9 (close to 1) tells us that the data does not need to be Box-Cox transformed.
A seasonal strength value of 0.72 (> 0.64) tells us that the data needs to be seasonally differenced.
A new seasonal strength value of 0.012 (< 0.64) after seasonal differencing indicates that we will not need to seasonally difference again.
A unitroot_ndiffs value of 0 after one round of seasonal differenceing (D = 1) confirms that we do not need to do another round.
The plot of the seasonally differenced time series does appear to have a constant mean and approximately constant variance over time.
Our KPSS test returns a non-significant p-value of 0.1, meaning we do not have evidence against the seasonally differenced data coming from a stationary distribution and therefore do not need to do any further differencing.
This is verified by a unitroot_ndiffs value of 0, meaning no more differencing is needed to reach a stationary distribution.

diff_data |>
  gg_tsdisplay(season_diff, plot_type = "partial", lag_max = 48)

## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_point()`).

Model 1: ARIMA(0, 0, 0)(4, 1, 0) We do not have any first order differencing or significant ACF/PACF values in the first 12 lags, so I used 0’s for all non-seasonal ARIMA parameters (p=0, d=0, q=0). We did do one seasonal difference (D = 1) and we see 4 significant PACF values at each 12 month lag, making it appropriate to set our seasonal AR component to 4 (P = 4).

Model 2: ARIMA(0, 0, 0)(0, 1, 2) We still do not have any first order differencing or significant ACF/PACF values in the first 12 lags, so I used 0’s again for all non-seasonal ARIMA parameters (p=0, d=0, q=0). And again we did do one seasonal difference (D = 1) but this time we’ll examine the 2 significant ACF values at lags 12 and 24. To do this we will set the seasonal MA component to 2 (Q = 2).

arima_models <- daily_sunshine |>
  model(ARIMA410 = ARIMA(SunshinePerDay ~ pdq(0,0,0) + PDQ(4,1,0)),
    ARIMA012 = ARIMA(SunshinePerDay ~ pdq(0,0,0) + PDQ(0,1,2)),
    AutoARIMA = ARIMA(SunshinePerDay))

arima_models |>
  glance() |>
  select(.model, AICc) |> 
  arrange()

## # A tibble: 3 × 2
##   .model     AICc
##   <chr>     <dbl>
## 1 ARIMA410   939.
## 2 ARIMA012   955.
## 3 AutoARIMA 1015.

arima_models |>
  select(ARIMA410) |> 
  report()

## Series: SunshinePerDay 
## Model: ARIMA(0,0,0)(4,1,0)[12] w/ drift 
## 
## Coefficients:
##          sar1     sar2     sar3     sar4  constant
##       -0.7205  -0.6269  -0.3792  -0.1597   -0.0728
## s.e.   0.0558   0.0655   0.0658   0.0572    0.0572
## 
## sigma^2 estimated as 0.982:  log likelihood=-463.55
## AIC=939.11   AICc=939.37   BIC=961.85

The model with the lowest AICc is ARIMA(0, 0, 0)(4, 1, 0). This model uses one seasonaly differencing and 4 seasonal AR components, performing significantly better than the other two. In this case the automatically fit model performed the worst.

Backshift notation: \[(1-\Phi_1 B^{12}-\Phi_2 B^{24} - \Phi_3 B^{36}-\Phi_4 B^{48})(1-B^{12})y_t = \varepsilon_t\] > 7.

cv_results <- daily_sunshine |>
  slide_tsibble(.size = 200, .step = 1) |>
  model(ETS = ETS(SunshinePerDay ~ error("M") + trend("N") + season("M")),
        ARIMA = ARIMA(SunshinePerDay ~ 1 + pdq(0,0,0) + PDQ(4,1,0))) |>
  forecast(h = 1)

cv_accuracy <- cv_results |>
  accuracy(daily_sunshine, measures = list(
      RMSE = RMSE,
      MAE = MAE,
      MAPE = MAPE,
      MASE = MASE))

## Warning: The future dataset is incomplete, incomplete out-of-sample data will be treated as missing. 
## 1 observation is missing at 1990 Dec

cv_accuracy

## # A tibble: 2 × 6
##   .model .type  RMSE   MAE  MAPE  MASE
##   <chr>  <chr> <dbl> <dbl> <dbl> <dbl>
## 1 ARIMA  Test  1.03  0.771  15.5 0.780
## 2 ETS    Test  0.957 0.753  15.1 0.762

The ETS(M, N, M) model performs better on all CV metrics than the ARIMA(0, 0, 0)(4, 1, 0) model. From this we can conclude that ETS is likely to have a better out-of-sample forecast accuracy.

STATS 326/786

Assignment 4

Liam Rohrer (973023817)

Problem 1: IonQ Stock Prices

Problem 2: Auckland Sunshine Hours