Due: Friday 15 May 2026 at 7 PM (NZ time)

Nontradable Inflation

#Reading in data and converting to a tsibble
nontradable <- read_csv("nontradable.csv") |>
  mutate(Quarter = yearquarter(Quarter)) |>
  as_tsibble(index = Quarter)
## Rows: 64 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): Quarter
## dbl (4): NontradableCPI, TradableCPI, LCI, Unemployment
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#Converting to quarter-on-quarter percentage changes
nontradable_ts <- nontradable |>
  mutate(
    Nontradable_CPI = (NontradableCPI / lag(NontradableCPI) - 1) * 100,
    Tradable_CPI = (TradableCPI / lag(TradableCPI) - 1) * 100,
    LCI_raw = (LCI / lag(LCI) - 1) * 100,
    Unemployment = Unemployment - lag(Unemployment))

#Extracting seasonally adjusted LCI with an STL model
lci_stl <- nontradable_ts |>
  filter(!is.na(LCI_raw)) |> 
  model(stl = STL(LCI_raw ~ season(window = "periodic"), robust = TRUE))

lci_components <- components(lci_stl)

#Adding seasonally adjusted LCI back into the tsibble
nontradable_clean <- nontradable_ts |>
  select(-LCI, -LCI_raw) |> 
  left_join(lci_components, by = "Quarter") |> 
  rename(LCI = season_adjust) |> 
  select(Quarter, Nontradable_CPI, Tradable_CPI, LCI, Unemployment) |> 
  filter(!is.na(Nontradable_CPI))

# Pivoting to long format for plotting
nontradable_long <- nontradable_clean |> 
  pivot_longer(
    cols = c(Nontradable_CPI, Tradable_CPI, LCI, Unemployment),
    names_to  = "Series",
    values_to = "Value") |>
  mutate(Series = factor(Series,
                    levels = c("Nontradable_CPI", "Tradable_CPI", "LCI", "Unemployment"),
                    labels = c("Nontradable CPI", "Tradable CPI", "LCI", "Unemployment")))

#Reproducing faceted time series plots
ggplot(nontradable_long, aes(x = Quarter, y = Value, colour = Series)) +
  geom_line() +
  facet_grid(Series ~ ., scales = "free_y") +
  labs(x = "Quarter", y = "Quarter-on-quarter % change") +
  theme(legend.position = "none")

The seasonally adjusted NontradableCPI and TradableCPI series appear to be closely related with shared spikes in late 2010, dips in 2015 and 2020, and a bulge from 2021 to 2024.

The seasonally adjusted LCI series shares the dip in 2020 and the bulge that followed with both CPI series, but doesn’t appear to be very closely related otherwise.

Finally, seasonally adjusted Unemployment appears inversely related to NontrardableCPI in some cases, displaying a dip in late 2010 and a spike in 2020. Otherwise there is no clear correlation.

# Creating lag 1 variable for all quarter-on-quarter percentage change time series
nontradable_clean <- nontradable_clean |>
  mutate(
    lag_NontradableCPI = lag(Nontradable_CPI, 1),
    lag_TradableCPI = lag(Tradable_CPI, 1),
    lag_LCI = lag(LCI, 1),
    lag_Unemployment = lag(Unemployment, 1)) |> 
  filter(!is.na(lag_NontradableCPI))

#Recreating pairs plots
nontradable_clean |>
  as_tibble() |> 
  select(Nontradable_CPI, lag_NontradableCPI, lag_TradableCPI, lag_LCI, lag_Unemployment) |> 
  ggpairs(
    columnLabels = c("NontradableCPI", "lag_NontradableCPI", "lag_TradableCPI", "lag_LCI", "lag_Unemployment"),
    upper = list(continuous = wrap("cor", size = 4)),
    lower = list(continuous = wrap("points", size = 0.8)),
    diag  = list(continuous = wrap("densityDiag"))) + 
  theme(strip.text = element_text(size = 6), axis.text = element_text(size = 6))

#Computing VIFs to assess suitability for analysis
vif_model <- lm(Nontradable_CPI ~ lag_NontradableCPI + lag_TradableCPI + lag_LCI + lag_Unemployment, data = nontradable_clean |> 
                  as_tibble() |> 
                  filter(!is.na(lag_NontradableCPI)))

vif(vif_model)
## lag_NontradableCPI    lag_TradableCPI            lag_LCI   lag_Unemployment 
##           2.609881           1.801511           1.955679           1.083656

The lagged time series of NontradableCPI, TradableCPI and LCI all have relatively high correlations with NontradableCPI, suggesting that these may be good predictors. Lagged Unemployment does not seem to be significantly correlated with NontradableCPI or any of the other lagged series, and therefore is unlikely to be a useful predictor.

There is some multicollinearity, particularly between lag_TradableCPI, lag_LCI, and lag_NontradableCPI. A further look into the variance inflation factors for these variables, however, reveals relatively low VIF values (VIFs < 3) which means they are acceptable for analysis. Additionally, we know that we can still forecast with multicollinearity, but it may increase the variance in our forecasts and harm the interpretation of individual variable coefficients. As a result, we are safe to use these variables for forecasting but should be mindful of forecast variance and interpretation of variable coefficients.

#Dummy variable for COVID (2020 Q2)
nontradable_clean <- nontradable_clean |>
  mutate(COVID = ifelse(Quarter == yearquarter("2020 Q2"), 1, 0))

#Splitting into training and testing sets
train <- nontradable_clean |>
  filter(year(Quarter) < 2025)
test <- nontradable_clean |>
  filter(year(Quarter) >= 2025)

#Fitting four candidate models to the data
#Model 1: using the main effects of all five predictors
fit_full <- train |>
  model(Full = TSLM(Nontradable_CPI ~ lag_NontradableCPI + lag_TradableCPI + lag_LCI + lag_Unemployment + COVID))

#Model 2: using only the main effects of predictors with strong correlations from Q2
fit_strong <- train |>
  model(StrongPredictors = TSLM(Nontradable_CPI ~ lag_NontradableCPI + lag_TradableCPI + lag_LCI))

#Model 3: using predictors from model two with the addition of the COVID dummy variable
fit_strongCOVID <- train |>
  model(StrongPlusCOVID = TSLM(Nontradable_CPI ~ lag_NontradableCPI + lag_TradableCPI + lag_LCI + COVID))

#Model 4: using the main effects and interactions of all predictors from model two, along with the addition of the COVID dummy variable
fit_strongInteractCOVID <- train |>
  model(StrongInteractPlusCOVID = TSLM(Nontradable_CPI ~ lag_NontradableCPI * lag_TradableCPI * lag_LCI + COVID))

#Extracting AIC, AICc, and BIC of each models and displaying all together in a combined table
model_comparison <- bind_rows(
  glance(fit_full) |> 
    select(.model, AIC, AICc, BIC),
  glance(fit_strong) |> 
    select(.model, AIC, AICc, BIC),
  glance(fit_strongCOVID) |> 
    select(.model, AIC, AICc, BIC),
  glance(fit_strongInteractCOVID) |> 
    select(.model, AIC, AICc, BIC)) |>
  select(.model, AIC, AICc, BIC) |>
  kable(digits = 3, caption = "Model Comparison")

model_comparison
Table 1: Model Comparison
.model AIC AICc BIC
Full -141.325 -139.085 -126.902
StrongPredictors -138.091 -136.937 -127.789
StrongPlusCOVID -141.756 -140.109 -129.393
StrongInteractPlusCOVID -149.596 -144.916 -128.992
#Output for model with the lowest AICc
report(fit_strongInteractCOVID)
## Series: Nontradable_CPI 
## Model: TSLM 
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -0.4521821 -0.1346760  0.0008742  0.0987905  0.9940517 
## 
## Coefficients:
##                                            Estimate Std. Error t value Pr(>|t|)
## (Intercept)                                -0.09857    0.32546  -0.303  0.76329
## lag_NontradableCPI                          1.05253    0.35725   2.946  0.00491
## lag_TradableCPI                            -0.27702    0.34668  -0.799  0.42811
## lag_LCI                                     0.63767    0.65364   0.976  0.33408
## COVID                                      -0.95320    0.28111  -3.391  0.00139
## lag_NontradableCPI:lag_TradableCPI         -0.06105    0.25732  -0.237  0.81344
## lag_NontradableCPI:lag_LCI                 -0.54513    0.56929  -0.958  0.34299
## lag_TradableCPI:lag_LCI                     1.36945    0.75555   1.813  0.07604
## lag_NontradableCPI:lag_TradableCPI:lag_LCI -0.53920    0.52289  -1.031  0.30751
##                                              
## (Intercept)                                  
## lag_NontradableCPI                         **
## lag_TradableCPI                              
## lag_LCI                                      
## COVID                                      **
## lag_NontradableCPI:lag_TradableCPI           
## lag_NontradableCPI:lag_LCI                   
## lag_TradableCPI:lag_LCI                    . 
## lag_NontradableCPI:lag_TradableCPI:lag_LCI   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2522 on 49 degrees of freedom
## Multiple R-squared: 0.7163,  Adjusted R-squared:  0.67
## F-statistic: 15.47 on 8 and 49 DF, p-value: 4.6829e-11
#Assumption checks for "best" model
gg_tsresiduals(fit_strongInteractCOVID)

#Ljung-Box test
augment(fit_strongInteractCOVID) |>
  features(.innov, ljung_box, lag = 20)
## # A tibble: 1 Ă— 3
##   .model                  lb_stat lb_pvalue
##   <chr>                     <dbl>     <dbl>
## 1 StrongInteractPlusCOVID    19.5     0.487
#Extracting forecasting inputs from the test set
forecast_inputs <- test |>
  select(Quarter, lag_NontradableCPI, lag_TradableCPI, lag_LCI, lag_Unemployment, COVID)

#Forecasting all models four periods ahead
fc_full   <- forecast(fit_full, new_data = forecast_inputs)
fc_strong <- forecast(fit_strong, new_data = forecast_inputs)
fc_strongCOVID <- forecast(fit_strongCOVID, new_data = forecast_inputs)
fc_strongInteractCOVID <- forecast(fit_strongInteractCOVID, new_data = forecast_inputs)

#Combining all forecasts
forecasts <- bind_rows(
  fc_full,
  fc_strong,
  fc_strongCOVID,
  fc_strongInteractCOVID)

#Plotting forecasts against observed values in the test set
forecasts |>
  autoplot(level = NULL) +                          # point forecasts only
  autolayer(nontradable_clean, Nontradable_CPI, colour = "grey") +
  labs(x = "Quarter", y = "Nontradable CPI % change", color  = "Model") +
  theme_bw()

#Computing accuracy of each forecast compared to the test set and extracting RMSE
RMSEs <- bind_rows(
  accuracy(fc_full,   test),
  accuracy(fc_strong, test),
  accuracy(fc_strongCOVID, test),
  accuracy(fc_strongInteractCOVID,  test)) |>
  select(.model, RMSE) |>
  kable(digits = 4, caption = "Forecast Accuracy on Test Set")

RMSEs
Table 1: Forecast Accuracy on Test Set
.model RMSE
Full 0.1074
StrongPredictors 0.1146
StrongPlusCOVID 0.1044
StrongInteractPlusCOVID 0.1763

Model selection

I fit four time series regression models to the training data:

  1. A full model using the main effects of all five predictors

  2. Using only the main effect of predictors that I found to be correlated with NontradableCPI in the pairs plot from question two

  3. Using the main effect of correlated predictors again, but adding in the COVID variable

  4. Using the main effect and interactions of all correlated predictors, and adding in the COVID variable

I decided to use my prior knowledge of which variables were correlated with NontradableCPI to inform how I built my models (model two). I then added in the COVID dummy variable because it seems to be important information to account for and the performance improvement when using it proved this to be true (model three). Finally, I had seen some multicollinearity between those variables found to be correlated with NontradableCPI in question two, and so decided to examine the effect of including the interaction between these variables when building my finall model (model four).

Assumption checks

The ACF plot and non-significant p-value (0.487) from our Ljung-Box test fail to give evidence for remaining autocorrelation. From the time series and histogram of residuals, we see a potential lack of normality which should be noted but doesn’t look extreme enough to prevent forecasting.

Model comparison

From the AICc values, I expect that the fourth model may produce the most accurate forecasts. This is due to an AICc of ~4.8 lower than the next best model, which suggests a significant improvement.

Surprisingly, when examining the RMSE of our forecasts compared to the test data, we see that model three performs the best with model four performing the worst.

These models use ex ante forecasting. We know this because all the information used for prediction is lagged by 1 quarter, and therefore will be known before the event that is being predicted. There is some nuance to this, however, since we are using new data in our test set to make predictions. The use of new data as we predict further into the future makes this similar to ex post forecasting, but the continuous lag between the information we are predicting and the information we have allows it to remain a genuine forecast.

  1. STATS 786 only